Directed evolution method

ABSTRACT

We describe a method of selecting an enzyme having replicase activity, the method comprising the steps of: (a) providing a pool of nucleic acids comprising members each encoding a replicase or a variant of the replicase; (b) subdividing the pool of nucleic acids into compartments, such that each compartment comprises a nucleic acid member of the pool together with the replicase or variant encoded by the nucleic acid member; (c) allowing nucleic acid replication to occur; and (d) detecting amplification of the nucleic acid member by the replicase. Methods for selecting agents capable of modulating replicase activity, and for selecting interacting polypeptides are also disclosed.

This application is a divisional of U.S. Ser. No. 10/387,387, filed Mar.13, 2003, which is a continuation-in-part of international applicationPCT/GB01/04108, filed Sep. 13, 2001, which claims the priority of eachof Great Britain application GB 0022458.4, filed Sep. 13, 2000, U.S.provisional application 60/283,771, filed Apr. 13, 2001 and U.S.provisional application 60/285,501, filed Apr. 20, 2001. Each of thesepriority documents is expressly incorporated herein in its entirety,including tables and drawings.

FIELD OF THE INVENTION

The present invention relates to methods for use in in vitro evolutionof molecular libraries. In particular, the present invention relates tomethods of selecting nucleic acids encoding gene products in which thenucleic acid and the activity of the encoded gene product are linked bycompartmentalisation.

BACKGROUND TO THE INVENTION

Evolution requires the generation of genetic diversity (diversity innucleic acid) followed by the selection of those nucleic acids whichencode beneficial characteristics. Because the activity of the nucleicacids and their encoded gene product are physically linked in biologicalorganisms (the nucleic acids encoding the molecular blueprint of thecells in which they are confined), alterations in the genotype resultingin an adaptive change(s) of phenotype produce benefits for the organismresulting in increased survival and offspring. Multiple rounds ofmutation and selection can thus result in the progressive enrichment oforganisms (and the encoding genotype) with increasing adaptation to agiven selection condition. Systems for rapid evolution of nucleic acidsor proteins in vitro must mimic this process at the molecular level inthat the nucleic acid and the activity of the encoded gene product mustbe linked and the activity of the gene product must be selectable.

In vitro selection technologies are a rapidly expanding field and oftenprove more powerful than rational design to obtain biopolymers withdesired properties. In the past decade selection experiments, using e.g.phage display or SELEX technologies have yielded many novelpolynucleotide and polypeptide ligands. Selection for catalysis hasproved harder. Strategies have included binding of transition stateanalogues, covalent linkage to suicide inhibitors, proximity couplingand covalent product linkage. Although these approaches focus only on aparticular part of the enzymatic cycle, there have been some successes.Ultimately however it would be desirable to select directly forcatalytic turnover. Indeed, simple screening for catalytic turnover offairly small mutant libraries has been rather more successful than thevarious selection approaches and has yielded some catalysts with greatlyimproved catalytic rates.

While polymerases are a prerequisite for technologies that definemolecular biology, i.e. site-directed mutagenesis, cDNA cloning and inparticular Sanger sequencing and PCR, they often suffer from seriousshortcomings due to the fact that they are made to perform tasks forwhich nature has not optimized them. Few attempts appear to have beenmade to improve the properties of polymerases available from nature andto tailor them for specific applications by protein engineering.Technical advances have been largely peripheral, and include the use ofpolymerases from a wider range of organisms, buffer and additive systemsas well as enzyme blends.

Attempts to improve the properties of polymerases have traditionallyrelied on protein engineering. For example, variants of Taq polymerase(for example, Stoffel fragment and Klentaq) have been generated by fullor partial deletion of its 5′-3′ exonuclease domain and show improvedthermostability and fidelity although at the cost of reducedprocessivity (Barnes 1992, Gene 112, 29-35, Lawyer et al., 1993, PCRMethods and Applications 2, 275). In addition, the availability ofhigh-resolution structures for proteins has allowed the rational designof mutants with improved properties (for example, Taq mutants withimproved properties of dideoxynucleotide incorporation for cyclesequencing, Li et al., 1999, Proc. Natl. Acad. Sci. USA 96, 9491). Invivo genetic approaches have also been used for protein design, forexample by complementation of a polA strain to select for activepolymerases from repertoires of mutant polymerases (Suzuki et al., 1996Proc. Natl. Acad. Sci. USA 93, 9670). However, the geneticcomplementation approach is limited in the properties that can beselected for.

Recent advances in molecular biology have allowed some molecules to beco-selected in vitro according to their properties along with thenucleic acids that encode them. The selected nucleic acids cansubsequently be cloned for further analysis or use, or subjected toadditional rounds of mutation and selection. Common to these methods isthe establishment of large libraries of nucleic acids. Molecules havingthe desired characteristics (activity) can be isolated through selectionregimes that select for the desired activity of the encoded geneproduct, such as a desired biochemical or biological activity, forexample binding activity.

WO99/02671 describes a method for isolating one or more genetic elementsencoding a gene product having a desired activity. Genetic elements arefirst compartmentalised into microcapsules, and then transcribed and/ortranslated to produce their respective gene products (RNA or protein)within the microcapsules. Alternatively, the genetic elements arecontained within a host cell in which transcription and/or translation(expression) of the gene product takes place and the host cells arefirst compartmentalised into microcapsules. Genetic elements whichproduce gene product having desired activity are subsequently sorted.The method described in WO99/02671 relies on the gene productcatalytically modifying the microcapsule or the genetic element (orboth), so that enrichment of the modified entity or entities enablesselection of the desired activity.

SUMMARY OF THE INVENTION

According to a first aspect of the present invention, we provide amethod of selecting a nucleic acid-processing (NAP) enzyme, the methodcomprising the steps of: (a) providing a pool of nucleic acidscomprising members encoding a NAP enzyme or a variant of the NAP enzyme;(b) subdividing the pool of nucleic acids into compartments, such thateach compartment comprises a nucleic acid member of the pool togetherwith the NAP enzyme or variant encoded by the nucleic acid member; (c)allowing nucleic acid processing to occur; and (d) detecting processingof the nucleic acid member by the NAP enzyme.

There is provided, according to a second aspect of the presentinvention, a method of selecting an agent capable of modifying theactivity of a NAP enzyme, the method comprising the steps of: (a)providing a NAP enzyme; (b) providing a pool of nucleic acids comprisingmembers encoding one or more candidate agents; (c) subdividing the poolof nucleic acids into compartments, such that each compartment comprisesa nucleic acid member of the pool, the agent encoded by the nucleic acidmember, and the NAP enzyme; and (d) detecting processing of the nucleicacid member by the NAP enzyme.

Preferably, the agent is a promoter of NAP enzyme activity. The agentmay be an enzyme, preferably a kinase or a phosphorylase, which iscapable of acting on the NAP enzyme to modify its activity. The agentmay be a chaperone involved in the folding or assembly of the NAR enzymeor required for the maintenance of replicase function (e.g. telomerase,HSP 90). Alternatively, the agent may be a polypeptide or polynucleotideinvolved in a metabolic pathway, the pathway having as an end product asubstrate which is involved in a replication reaction. The agent maymoreover be any enzyme which is capable of catalysing a reaction thatmodifies an inhibiting agent (natural or unnatural) of the NAP enzyme insuch a way as to reduce or abolish its inhibiting activity. Finally theagent may promote NAP activity in a non-catalytic way, e.g. byassociation with the NAP enzyme or its substrate etc. (e.g. processivityfactors in the case of DNA polymerases, e.g. T7 DNA polymerase &thioredoxin).

We provide, according to a third aspect of the present invention, amethod of selecting a pair of polypeptides capable of stableinteraction, the method comprising: (a) providing a first nucleic acidand a second nucleic acid, the first nucleic acid encoding a firstfusion protein comprising a first subdomain of a NAP enzyme fused to afirst polypeptide, the second nucleic acid encoding a second fusionprotein comprising a second subdomain of a NAP enzyme fused to a secondpolypeptide; in which stable interaction of the first and second NAPenzyme subdomains generates NAP enzyme activity, and in which at leastone of the first and second nucleic acids is provided in the form of apool of nucleic acids encoding variants of the respective first and/orsecond polypeptide(s); (b) subdividing the pool or pools of nucleicacids into compartments, such that each compartment comprises a firstnucleic acid and a second nucleic acid together with respective fusionproteins encoded by the first and second nucleic acids; (c) allowing thefirst polypeptide to bind to the second polypeptide, such that bindingof the first and second polypeptides leads to stable interaction of theNAP enzyme subdomains to generate NAP enzyme activity; and (d) detectingprocessing of at least one of the first and second nucleic acids by theNAP enzyme.

Moreover, the NAP enzyme domains referred to in (a) above may bereplaced with domains of a polypeptide capable of modifying the activityof NAP enzymes, as discussed in the second aspect of the presentinvention, and NAP enzyme activity used to select such modifyingpolypeptides having desired properties.

Preferably, each of the first and second nucleic acids is provided froma pool of nucleic acids.

Preferably, the first and second nucleic acids are linked eithercovalently (e.g. as part of the same template molecule) ornon-covalently (e.g. by tethering onto beads etc.).

NAP enzymes may for example be polypeptide or ribonucleic acid enzymemolecules. In a highly preferred embodiment, the NAP enzyme according tothe invention is a replicase enzyme, i.e. an enzyme, which is capable ofamplifying nucleic acid from a template, such as for example apolymerase enzyme (or ligase). The invention is described herein belowwith specific reference to replicases; however, it will be understood bythose skilled in the art that the invention is equally applicable toother NAP enzymes, such as telomerases and helicases, as further set outbelow, which process nucleic acids in ways not limited to amplificationbut which are nevertheless selectable by detecting nucleic acidamplification, i.e. which promote replication indirectly.

In a preferred embodiment of the invention, amplification of the nucleicacid results from more than one round of nucleic acid replication.Preferably, the amplification of the nucleic acid is an exponentialamplification.

The amplification reaction is preferably selected from the following: apolymerase chain reaction (PCR), a reverse transcriptase-polymerasechain reaction (RT-PCR), a nested PCR, a ligase chain reaction (LCR), atranscription-based amplification system (TAS), a self-sustainingsequence replication (3SR), NASBA, a transcription-mediatedamplification reaction (TMA), and a strand-displacement amplification(SDA).

In a highly preferred embodiment, the post-amplification copy number ofthe nucleic acid member is substantially proportional to the activity ofthe replicase, the activity of a requisite agent, or the bindingaffinity and/or binding kinetics of the first and second polypeptides.

Nucleic acid replication may be detected by assaying the copy number ofthe nucleic acid member. Alternatively, or in addition, nucleic acidreplication may be detected by determining the activity of a polypeptideencoded by the nucleic acid member.

In a highly preferred embodiment, the conditions in the compartment areadjusted to select for a replicase or agent active under suchconditions, or a pair of polypeptides capable of stable interactionunder such conditions.

The replicase preferably has polymerase, reverse transcriptase or ligaseactivity.

The polypeptide may be provided from the nucleic acid by in vitrotranscription and translation. Alternatively, the polypeptide may beprovided from the nucleic acid in vivo in an expression host.

In a preferred embodiment, the compartments consist of the encapsulatedaqueous component of a water-in-oil emulsion. The water-in-oil emulsionis preferably produced by emulsifying an aqueous phase with an oil phasein the presence of a surfactant comprising 4.5% v/v Span 80, 0.4% v/vTween 80 and 0.1% v/v Triton X100, or a surfactant comprising Span 80,Tween 80 and Triton X100 in substantially the same proportions.Preferably, the water:oil phase ratio is 1:2, which leads to adequatedroplet size. Such emulsions have a higher thermal stability than moreoil-rich emulsions.

As a fourth aspect of the present invention, there is provided areplicase enzyme identified by a method according to any precedingclaim. Preferably, the replicase enzyme has a greater thermostabilitythan a corresponding unselected enzyme. More preferably, the replicaseenzyme is a Taq polymerase having more than 10 times increased half-lifeat 97.5° C. when compared to wild-type Taq polymerase.

The replicase enzyme may have a greater tolerance to heparin than acorresponding unselected enzyme. Preferably, the replicase enzyme is aTaq polymerase active at a concentration of 0.083 units/μl or more ofheparin.

The replicase enzyme may be capable of extending a primer having a 3′mismatch. Preferably, the 3′ mismatch is a 3 purine-purine mismatch or a3′ pyrimidine-pyrimidine mismatch. More preferably, the 3′ mismatch isan A-G mismatch or the 3′ mismatch is a C-C mismatch.

We provide, according to a fifth aspect of the present invention, a Taqpolymerase mutant comprising the mutations (amino acid substitutions):F73S, R205K, K219E, M236T, E434D and A608V.

The present invention, in a sixth aspect, provides a Taq polymerasemutant comprising the mutations (amino acid substitutions): K225E,E388V, K540R, D578G, N583S and M747R.

The present invention, in a seventh aspect, provides a Taq polymerasemutant comprising the mutations (amino acid substitutions): G84A, D144G,K314R, E520G, A608V, E742G.

The present invention, in a eighth aspect, provides a Taq polymerasemutant comprising the mutations (amino acid substitutions): D58G, R74P,A109T, L245R, R343G, G370D, E520G, N583S, E694K, A743P.

In a ninth aspect of the present invention, there is provided awater-in-oil emulsion obtainable by emulsifying an aqueous phase with anoil phase in the presence of a surfactant comprising 4.5% v/v Span 80,0.4% v/v Tween 80 and 0.1% v/v Triton X100, or a surfactant comprisingSpan 80, Tween 80 and Triton X100 in substantially the same proportions.Preferably, the water:oil phase ratio is 1:2. This ratio appears topermit diffusion of dNTPs (and presumably other small molecules) betweencompartments at higher temperatures, which is beneficial for someapplications but not for others. Diffusion can be controlled byincreasing water:oil phase ratio to 1:4.

In another aspect, the NAP enzyme is a replicase enzyme that has anenhanced capability to replicate substrates 23 kb in size or greater inthe absence of processivity factors or a 3′-5′ exonuclease proof-readingdomain.

As used herein, the phrase “variant of a nucleic acid processing enzyme”means a NAP enzyme with an amino acid sequence (for polypeptide enzymes)or nucleotide sequence (for ribozymes) differs from a naturallyoccurring sequence of that NAP enzyme by at least one amino acid (forpolypeptide enzymes) or nucleotide (for ribozymes). A variant NAP enzymecatalyzes a reaction catalyzed by the corresponding wild-type NAPenzyme.

As used herein, the phrase “modifying the activity of a NAP enzyme”means causing the activity of a NAP enzyme to increase or decrease, orchanging another aspect of the enzyme's activity, such as substrateidentity or substrate specificity, the reaction catalyzed, cofactordependence, optimal salt, buffer or temperature conditions, temperaturestability, proofreading capacity, interaction with other proteins orenzymes, or sensitivity to inhibition.

As used herein, the phrase “enhancing the activity of a NAP enzyme” or“increasing the activity of a NAP enzyme” means increasing the amount ofa product of a reaction catalyzed by a NAP enzyme in the presence of anenhancing stimulus under a particular set of conditions by at least 10%relative to the amount formed under similar conditions in the absence ofthe enhancing stimulus.

As used herein, the phrase “promoter of NAP enzyme activity” refers toan agent that increases the activity of a given NAP enzyme.

As used herein, the phrase “polypeptide that produces a substrate in anucleic acid processing reaction” means a polypeptide enzyme thatcatalyzes a reaction resulting in the production of a substrate for aNAP enzyme. A non-limiting example of a polypeptide that produces asubstrate in a nucleic acid processing reaction is a nucleosidediphosphate kinase, which catalyzes the phosphorylation ofdeoxynucleoside diphosphates to deoxynucleoside triphosphates, which aresubstrates for NAP enzymes such as DNA polymerases.

As used herein, the phrase “polypeptide that consumes an inhibitor in anucleic acid processing reaction” means a polypeptide enzyme thatcatalyzes a reaction resulting in the inactivation of an inhibitor of aNAP enzyme reaction. A non-limiting example of a polypeptide thatconsumes an inhibitor in a nucleic acid processing reaction is aheparinase. Heparin is an inhibitor of polymerase activity, andheparinase enzymes break down heparin, thereby consuming the inhibitormolecule.

As used herein, the phrase “polypeptide that modifies a nucleotideprimer or nucleoside triphosphate substrate used in a nucleic acidprocessing reaction” means a polypeptide enzyme that catalyzes achemical modification of a nucleotide primer or a nucleosidetriphosphate substrate for a NAP enzyme, the modification permitting thenucleotide primer or nucleoside triphosphate to participate in areaction catalyzed by the NAP enzyme.

As used herein, the phrase “substrate appendage added to a nucleotideprimer or nucleoside triphosphate” means a chemical moiety, added to anucleotide primer or nucleoside triphosphate, that is acted upon by anenzyme that “modifies a nucleotide primer or nucleoside triphosphate” asthat term is defined herein above. Most often, such a “substrateportion” is inhibitory to the activity of a NAP enzyme on that primer ornucleoside triphosphate.

As used herein, the phrase “stable interaction” means a physicalinteraction between two polypeptides. As the term is used herein, a“stable interaction” between two polypeptide sequences fused torespective, separate NAP enzyme subdomain polypeptides is an interactionthat permits the respective NAP enzyme subdomains that do not alonecatalyze a reaction catalyzed by the intact NAP enzyme to togethercatalyze a reaction that is catalyzed by the intact NAP enzyme.

As used herein, the phrase “subdomain of a NAP enzyme” means a portionof a NAP enzyme polypeptide, which portion, separate and on its own doesnot have catalytic activity, but which, when brought into physicalcontact with another polypeptide comprising another portion of that NAPenzyme, reconstitutes a functional NAP enzyme capable of catalyzing areaction catalyzed by the intact NAP enzyme that is not catalyzed byeither portion of the NAP enzyme on its own. Non-limiting examples ofsubdomains of a NAP enzyme are described by Vainshtein et al., 1996,Protein Science 5: 1785.

As used herein, the phrase “stable interaction of first and second NAPsubdomains generates processing activity” means that the physicalinteraction of two separate subdomains of a NAP enzyme, as the term isdefined herein, reconstitutes a catalytic activity of the intact NAPenzyme that is not possessed by either the first or second NAPsubdomains on their own.

As used herein, the phrase “subdomain of a polypeptide that enhances theactivity of a NAP enzyme” means a portion of a polypeptide that, whenintact, enhances the activity of a NAP enzyme. As it is used herein, the“subdomain” of such a polypeptide is a portion that does not, on itsown, enhance the activity of a NAP enzyme, but when in physical contactwith another portion of that enhancing polypeptide, reconstitutes NAPactivity enhancement.

As used herein, the phrase “stable folding” means that a polypeptideassumes a tertiary structure that exhibits a sigmoidal thermaldenaturation curve. A “a non-folded or improperly folded polypeptide” isnon-functional relative to a properly folded polypeptide and tends toaggregate and precipitate.

As used herein, the phrase “poorly folding polypeptide” means apolypeptide that tends to aggregate and precipitate unless it ispermitted to fold in the presence of a chaperone. Fusion of a “poorlyfolding polypeptide” will inhibit the activity of a NAP enzyme unlessthe fusion polypeptide is folded in the presence of a chaperone.

As used herein, the phrase “replication of a nucleic acid member” meansthe template-directed addition of at least one nucleotide to a nucleicacid substrate of a NAP enzyme. That is, “replication” as the term isused herein encompasses template-directed replication of an entirenucleic acid molecule, as well template-directed replication of lessthan an entire nucleic acid molecule.

As used herein, the term “proportional” refers to a direct numericalrelationship between two measurable quantities, such as the activity ofan enzyme and the amount of product of the reaction catalyzed by thatenzyme. The phrase “substantially proportional” encompasses aproportional relationship between two measurable quantities as well as arelationship that varies from direct proportion by 20% or less. Forexample, where a doubling of the rate of enzyme activity would result ina doubling of the amount of product produced per unit time in a directlyproportional relationship (100% increase in each of enzyme activity andproduct produced), an increase of 80% to 120% would be considered“substantially proportional.”

As used herein, the phrase “tagging of the nucleic acid member” meanscovalently or non-covalently appending a detectable moiety to a nucleicacid. Non-limiting examples of tags include radiolabels, fluorescentmoieties, antibodies and stretches of nucleotide sequence that permitdetection with a nucleic acid probe, antibody, specific binding partneror enzyme.

As used herein, the phrase “unnatural 3′ base” means a nitrogenous basestructure, comprised by the 3′ nucleotide of a nucleic acid, that doesnot occur on a nucleotide in nature.

As used herein, the phrase “enhanced capability to replicate substrates23 kb in size” means that a mutated replicase enzyme replicates asubstrate 23 kb in size at least 10% more efficiently than thenon-mutated version of that replicase enzyme.

As used herein, the term “processivity factor” means a polypeptide thatincreases the amount of polymerization catalyzed by a polymerase eachtime the polymerase initiates. Processivity factors are well known inthe art. Non-limiting examples of processivity factors includethioredoxin (increases processivity of bacteriophage T7 polymerase),PCNA (increases processivity of eukaryotic Pol δ) and the β subunit ofDNA Pol III (DnaN; increases the processivity of bacterial Pol III).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a diagram showing an embodiment of a method according to thepresent invention as applied to selection of a self-evolving polymerase,in which gene copy number is linked to enzymatic turnover.

FIG. 1B is a diagram showing a general scheme of compartmentalisedself-replication (CSR): 1) A repertoire of diversified polymerase genesis cloned and expressed in E. coli. Spheres represent active polymerasemolecules. 2) Bacterial cells containing the polymerase and encodinggene are suspended in reaction buffer containing flanking primers andnucleotide triphosphates (dNTPs) and segregated into aqueouscompartments. 3) The polymerase enzyme and encoding gene are releasedfrom the cell allowing self-replication to proceed. Poorly activepolymerases (white hexagon) fail to replicate their encoding gene. 4)The “offspring” polymerase genes are released, rediversified andrecloned for another cycle of CSR.

FIG. 2 is a diagram showing aqueous compartments of the heat-stableemulsion containing E. coli cells expressing green fluorescent protein(GFP) prior to (A, B), and after thermocycling (C), as imaged by lightmicroscopy. (A, B) represent the same frame. (A) is imaged at 535 nm forGFP fluorescence and (B) in visible light to visualize bacterial cellswithin compartments. Smudging of the fluorescent bacteria in (A) is dueto Brownian motion during exposure. Average compartment dimensions asdetermined by laser diffraction are given below.

FIG. 3A is a diagram showing crossover between emulsion compartments.Two standard PCR reactions, differing in template size (PCR1 (0.9 kb),PCR2 (0.3 kb)) and presence of Taq (PCR1: +Taq, PCR 2: no enzyme), areamplified individually or combined. When combined in solution, bothtemplates are amplified. When emulsified separately, prior to mixing,only PCR1 is amplified. M: φX174 HaeIII marker.

FIG. 3B is a diagram showing crossover between emulsion compartments.Bacterial cells expressing wild-type Taq polymerase (2.7 kb) or the Taqpolymerase Stoffel fragment (poorly active under the buffer conditions)(1.8 kb) are mixed 1:1 prior to emulsification. In solution, the shorterStoffel fragment is amplified preferentially. In emulsion, there ispredominantly amplification of the wt Taq gene and only weakamplification of the Stoffel fragment (arrow). M: λHindIII marker.

FIG. 4 is a diagram showing details of an embodiment of a methodaccording to the present invention as applied to selection of aself-evolving polymerase.

FIG. 5 is a diagram showing details of an embodiment of a methodaccording to the present invention to select for incorporation of novelor unusual substrates.

FIG. 6 is a diagram showing selection of RNA having (intermolecular)catalytic activity using the methods of our invention.

FIG. 7 is a diagram showing a model of a Taq-DNA complex.

FIG. 8: A: General scheme of a cooperative CSR reaction.

-   -   Nucleoside diphosphate kinase (ndk) is expressed from a plasmid        and converts deoxinucleoside diphosphates which are not        substrates for Taq polymerase into deoxinucleoside triphosphates        which are. As soon as ndk has produced sufficient amounts of        substrate, Taq can replicate the ndk gene.    -   B: Bacterial cells expressing wild-type ndk (0.8 kb) or an        inactive truncated fragment (0.5 kb) are mixed 1:1 prior to        emulsification. In solution, the shorter truncated fragment is        amplified preferentially. In emulsion, there is predominantly        amplification of the wt ndk gene and only weak amplification of        the truncated fragment (arrow) indicating that in emulsion only        active ndk genes producing substrate are amplified. M: HaeIII        φX174 marker.

DETAILED DESCRIPTION OF THE INVENTION

The practice of the present invention will employ, unless otherwiseindicated, conventional techniques of chemistry, molecular biology,microbiology, recombinant DNA and immunology, which are within thecapabilities of a person of ordinary skill in the art. Such techniquesare explained in the literature. See, e.g., J. Sambrook, E. F. Fritsch,and T. Maniatis, 1989, Molecular Cloning: A Laboratory Manual, SecondEdition, Books 1-3, Cold Spring Harbor Laboratory Press; B. Roe, J.Crabtree, and A. Kahn, 1996, DNA Isolation and Sequencing: EssentialTechniques, John Wiley & Sons; J. M. Polak and James O'D. McGee, 1990,In situ Hybridization: Principles and Practice; Oxford University Press;M. J. Gait (Editor), 1984, Oligonucleotide Synthesis: A PracticalApproach, Irl Press; and, D. M. J. Lilley and J. E. Dablberg, 1992,Methods of Enzymology: DNA Structure Part A: Synthesis and PhysicalAnalysis of DNA Methods in Enzymology, Academic Press. Each of thesegeneral texts are herein incorporated by reference.

Compartmentalised Self Replication

Our invention describes a novel selection technology, which we call CSR(compartmentalised self-replication). It has the potential to beexpanded into a generic selection system for catalysis as well asmacromolecular interactions.

In its simplest form CSR involves the segregation of genes coding forand directing the production of DNA polymerases within discrete,spatially separated, aqueous compartments of a novel heat-stablewater-in-oil emulsion. Provided with nucleotide triphosphates andappropriate flanking primers, polymerases replicate only their owngenes. Consequently, only genes encoding active polymerases arereplicated, while inactive variants that cannot copy their genesdisappear from the gene pool. By analogy to biological systems, amongdifferentially adapted variants, the most active (the fittest) producethe most “offspring”, hence directly correlating post-selection copynumber with enzymatic turnover.

CSR is not limited to polymerases but can be applied to a wide varietyof enzymatic transformations, built around the “replicase engine”. Forexample, an enzyme “feeding” a polymerase which in turn replicates itsgene may be selected. More complicated coupled cooperative reactionschemes can be envisioned in which several enzymes either producereplicase substrates or consume replicase inhibitors.

Polymerases occupy a central role in genome maintenance, transmissionand expression of genetic information. Polymerases are also at the heartof modern biology, enabling core technologies such as mutagenesis, cDNAlibraries, sequencing and the polymerase chain reaction (PCR). However,commonly used polymerases frequently suffer from serious shortcomings asthey are used to perform tasks for which nature had not optimized them.Indeed, most advances have been peripheral, including the use ofpolymerases from different organisms, improved buffer and additivesystems as well as enzyme blends. CSR is a novel selection systemideally suited for the isolation of “designer” polymerases for specificapplications. Many features of polymerase function are open to“improvement” (e.g. processivity, substrate selection etc.).Furthermore, CSR is a tool to study polymerase function, e.g. to probeimmutable regions, study components of the replisome etc. Moreover, CSRmay be used for shotgun functional cloning of polymerases, straight fromdiverse, uncultured microbial populations.

CSR represents a novel principle of repertoire selection ofpolypeptides. Previous approaches have featured various “display”methods in which phenotype and genotype (polypeptide and encoding gene)are linked as part of a “genetic package” containing the encoding geneand displaying the polypeptide on the “outside”. Selection occurs via astep of affinity purification after which surviving clones are grown(amplified) in cells for further rounds of selection (with resultingbiases in growth distorting selections). Further distortions result fromdifferences in the display efficiencies between different polypeptides.

In another set of methods both polypeptide and encoding gene(s) are“packaged” within a cell. Selection occurs in vivo through thepolypeptide modifying the cell in such a way that it acquires a novelphenotype, e.g. growth in presence of an antibiotic. As the selectionpressure is applied on whole cells, such approaches tend to be prone tothe generation of false positives. Furthermore, in vivo complementationstrategies are limited in that selection conditions, and henceselectable phenotypes, cannot be freely chosen and are furtherconstrained by limits of host viability.

In CSR, there is no direct physical linkage (covalent or non-covalent)between polypeptide and encoding gene. More copies of successful genesare “grown” directly and in vitro as part of the selection process.

CSR is applicable to a broad spectrum of DNA and RNA polymerases, indeedto all polypeptides (or polynucleotides) involved in replication or geneexpression. CSR can also be applied to DNA and RNA ligases assemblingtheir genes from oligonucleotide fragments.

CSR is the only selection system in which the turnover rate of an enzymeis directly linked to the post-selection copy-number of its encodinggene.

There is great interest in polynucleotide polymers with altered bases,altered sugars or even backbone chemistries. However, solid-phasesynthesis can usually only provide relatively short polymers andnaturally occurring polymerases unsurprisingly incorporate mostanalogues poorly. CSR is ideally suited for the selection of polymerasesmore tolerant of unnatural substrates in order to prepare polynucleotidepolymers with novel properties for chemistry, biology and nanotechnology(e.g. DNA wires).

Finally, the heat-stable emulsion developed for CSR has applications onits own. With >10⁹ microcompartments/ml, emulsion PCR (ePCR) offers thepossibility of parallel PCR multiplexing on a unprecedented scale withpotential applications from gene linkage analysis to genomic repertoireconstruction directly from single cells. It may also have applicationsfor large-scale diagnostic PCR applications like “Digital PCR”(Vogelstein and Kinzler, 1999, PNAS 96, 9236-9241). Compartmentalizingindividual reactions can also even out competition among different genesegments that are amplified in either multiplex or random primed PCR andleads to a less biased distribution of amplification products. ePCR maythus provide an alternative to whole genome DOP-PCR (and relatedmethodologies) or indeed be used to make DOP-PCR (and relatedmethodologies) more effective.

The selection system according to our invention is based onself-replication in a compartmentalised system. Our invention relies onthe fact that active replicases are able to replicate nucleic acids (inparticular their coding sequences), while inactive replicases cannot.Thus, in the methods of our invention, we provide a compartmentalisedsystem where a replicase in a compartment is substantially unable to acton any template other than the templates within that compartment; inparticular, it cannot act to replicate a template within any othercompartment. In highly preferred embodiments, the template nucleic acidwithin the compartment encodes the replicase. Thus, the replicase cannotreplicate anything other than its coding sequence; the replicase istherefore “linked” to its coding sequence. As a result, in highlypreferred embodiments of our invention, the final concentration of thecoding sequence (i.e. copy number) is dependent on the activity of theenzyme encoded by it.

Our selection system as applied to selection of replicases has theadvantage in that it links catalytic turnover (k_(cat)/K_(m)) to thepost-selection copy-number of the gene encoding the catalyst. Thus,compartmentalisation offers the possibility of linking genotype andphenotype of a replicase enzyme, as described in further detail below,by a coupled enzymatic reaction involving the replication of the gene orgenes of the enzyme(s) as one of its steps.

The methods of our invention preferably make use of nucleic acidlibraries, the nature and construction of which will be explained ingreater detail below. The nucleic acid library comprises a pool ofdifferent nucleic acids, members of that encode variants of a particularentity (the entity to be selected). Thus, for example, as used to selectfor replicases, the methods of our invention employ a nucleic acidlibrary or pool having members, which encode the replicase or variantsof the replicase. Each of the entities encoded by the various members ofthe library will have different properties, e.g., varying tolerance toheat or to the presence of inhibitory small molecules, or tolerance forbase pair mismatches (as explained in further detail below). Thepopulation of nucleic acid variants therefore provides a startingmaterial for selection, and is in many ways analogous to variation in anatural population of organisms caused by mutation.

According to our invention, the different members of the nucleic acidlibrary or pool are sorted or compartmentalised into many compartmentsor microcapsules. In preferred embodiments, each compartment containssubstantially one nucleic acid member of the pool (in one or severalcopies). In addition, the compartment also comprises the polypeptide orpolynucleotide (in one or preferably several copies) encoded by thatnucleic acid member (whether it is a replicase, an agent, a polypeptide,etc. as discussed below). The nature of these compartments is such thatminimal or substantially no interchange of macromolecules (such asnucleic acids and polypeptides) occurs between different compartments.As explained in further detail below, highly preferred embodiments ofour invention make use of aqueous compartments within water-in-oilemulsions. As explained above, any replicase activity present in thecompartment (whether exhibited by the replicase, modified by an agent,or exhibited by the polypeptide acting in conjunction with anotherpolypeptide) can only act on the template within the compartment.

The conditions within the compartments may be varied in order to selectfor polypeptides active under these conditions. For example, wherereplicases are selected, the compartments may have an increasedtemperature to select for replicases with higher thermal stability.Furthermore, using the selection methods described here on fusionproteins comprising thermostable replicase and a protein of interestwill allow the selection of thermally stable proteins.

A method for the incorporation of thermal stability into otherwiselabile proteins of commercial importance is desirable with regards totheir large-scale production and distribution. A reporter system hasbeen described to improve protein folding by expressing proteins asfusions with green fluorescent protein (GFP) (Waldo et al., 1999, Nat.Biotechnol. 17, 691-695). The function of the latter is related to theproductive folding of the fused protein influencing folding and/orfunctionality of the GFP, enabling the directed evolution of variantswith improved folding and expression. According to this aspect of ourinvention, proteins are fused to a thermostable replicase (or an agentpromoting replicase activity) and selecting for active fusions inemulsion as a method for evolving proteins with increasedthermostability and/or solubility. Unstable variants of the fusionpartner are expected to aggregate and precipitate prior to or duringthermal cycling, thus compromising replicase activity within respectivecompartments. Viable fusions will allow for self-amplification inemulsion, with the turnover rate being linked to the stability of thefusion partner.

In a related approach, novel or increased chaperonin activity may beevolved by coexpression of a library of chaperones together with apolymerase-polypeptide fusion protein, in which the protein moietymisfolds (under the selection conditions). Replication of the gene(s)encoding the chaperonin can only proceed after chaperonin activity hasrescued polymerase activity in the polymerase-polypeptide fusionprotein.

Thermostability of an enzyme may be measured by conventional means asknown in the art. For example, the catalytic activity of the nativeenzyme may be assayed at a certain temperature as a benchmark. Enzymeassays are well known in the art, and standard assays have beenestablished over the years. For example, incorporation of nucleotides bya polymerase is measured, by for example, use of radiolabeled dNTPs suchas dATP and filter binding assays as known in the art. The enzyme whosethermostability is to be assayed is preincubated at an elevatedtemperature and then its activity retained (for example, polymeraseactivity in the case of polymerases) is measured at a lower, optimumtemperature and compared to the benchmark. In the case of Taqpolymerase, the elevated temperature is 97.5° C.; the optimumtemperature is 72° C. Thermostability may be expressed in the form ofhalf-life at the elevated temperature (i.e. time of incubation at highertemperature over which polymerase loses 50% of its activity). Forexample, the thermostable replicases, fusion proteins or agents selectedby our invention may have a half-life that is 2×, 3×, 4×, 5×, 6×, 7×,8×, 9×, 10× or more than the native enzyme. Most preferably, thethermostable replicases etc. have a half-life that is 11× or more whencompared this way. Preferably, selected polymerases are preincubated at95° C. or more, 97.5° C. or more, 100° C. or more, 105° C. or more, or110° C. or more. Thus, in a highly preferred embodiment of ourinvention, we provide polymerases with increased thermostability whichdisplay a half life at 97.5° C. that is 11× or more than thecorresponding wild-type (native) enzyme.

Resistance to an inhibitory agent, such as heparin in the case ofpolymerases, may also be assayed and measured as above. Resistance toinhibition may be expressed in terms of the concentration of theinhibitory factor. For example, in preferred embodiments of theinvention, we provide heparin resistant polymerases that are active inup to a concentration of heparin between 0.083 units/μl to 0.33units/μl. For comparison, our assays indicate that the concentration ofheparin which inhibits native (wild-type) Taq polymerase is in theregion of between 0.0005 to 0.0026 units/μl.

Resistance is conveniently expressed in terms of the inhibitorconcentration, which is found to inhibit the activity of the selectedreplicase, fusion protein or agent, compared to the concentration, whichis found to inhibit the native enzyme. Thus, the resistant replicases,fusion proteins, or agents selected by our invention may have 10×, 20×,30×, 40×, 50×, 60×, 70×, 80×, 90×, 100×, 110×, 120×, 130×, 140×, 150×,160×, 170×, 180×, 190×, 200×, or more resistance compared to the nativeenzyme. Most preferably, the resistant replicases etc. have 130× or morefold increased resistance when compared this way. The selectedreplicases etc. preferably have 50% or more, 60% or more, 70% or more,80% or more, 90% or more, or even 100% activity at the concentration ofthe inhibitory factor. Furthermore, the compartments may contain amountsof an inhibitory agent such as heparin to select for replicases havingactivity under such conditions.

As explained below, the methods of our invention may be used to selectfor a pair of interacting polypeptides, and the conditions within thecompartments may be altered to choose polypeptides capable of actingunder these conditions (for example, high salt, or elevated temperature,etc.). The methods of our invention may also be used to select for thefolding, stability and/or solubility of a fused polypeptide acting underthese conditions (for example, high salt, or elevated temperature,chaotropic agents etc.).

The method of selection of our present invention may be used to selectfor various replicative activities, for example, for polymeraseactivity. Here, the replicase is a polymerase, and the catalyticreaction is the replication by the polymerase of its own gene. Thus,defective polymerases or polymerases which are inactive under theconditions under which the reaction is carried out (the selectionconditions) are unable to amplify their own genes. Similarly,polymerases which are less active will replicate their coding sequenceswithin their compartments more slowly. Accordingly, these genes will beunder-represented, or even disappear from the gene pool.

Active polymerases, on the other hand, are able to replicate their owngenes, and the resulting copy number of these genes will be increased.In a preferred embodiment of the invention, the copy number of a genewithin the pool will be bear a direct relation to the activity of theencoded polypeptide under the conditions under which the reaction iscarried out. In this preferred embodiment, the most active polymerasewill be most represented in the final pool (i.e., its copy number withinthe pool will be highest). As will be appreciated, this enables easycloning of active polymerases over inactive ones. The method of ourinvention therefore is able to directly link the turnover rate of theenzyme to the resulting copy-number of the gene encoding it.

As an example, the method may be applied to the isolation of activepolymerases (DNA-, RNA-polymerases and reverse transcriptases) fromthermophilic organisms. Briefly, a thermostable polymerase is expressedintracellularly in bacterial cells and these are compartmentalised (e.g.in a water-oil emulsion) in appropriate buffer together with appropriateamounts of the four dNTPs and oligonucleotides priming at either end ofthe polymerase gene or on plasmid sequences flanking the polymerasegene. The polymerase and its gene are released from the cells by atemperature step that lyses the cells and destroys enzymatic activitiesassociated with the host cell. Polymerases from mesophilic organisms (orless thermostable polymerases) may be expressed in an analogous wayexcept cell lysis should either proceed at ambient temperature (e.g. byexpression of a lytic protein (e.g. derived from lytic bacteriophages,by detergent mediated lysis (e.g. Bugbuster™, commercially available) orlysis may proceed at elevated temperature in the presence of apolymerase stabilizing agent (e.g. high concentrations of proline (seeexample 27) in the case of Klenow or trehalose in the case of RT). Insuch cases background polymerase activity of the host strain mayinterfere with selections and it may be preferable to make use of mutantstrains (e.g. polA⁻).

Alternatively, polymerase genes (either as plasmids or linear fragments)may be compartmentalised as above and the polymerase expressed in situwithin the compartments using in vitro transcription translation (ivt),followed by a temperature step to destroy enzymatic activitiesassociated with the in vitro translation extract. Polymerases frommesophilic organisms (or less thermostable polymerases) may be expressedin situ in an analogous way except in order to avoid enzymaticactivities associated with the in vitro translation extract it may bepreferable to use a translation extract reconstituted from definedpurified components like the PURE system (Shimizu et al., 2001, Nat.Biotech. 19, 751).

PCR thermocycling then leads to the amplification of the polymerasegenes by the polypeptides they encode, i.e. only genes encoding activepolymerases, or polymerases active under the chosen conditions will beamplified. Furthermore, the copy number of a polymerase gene X afterself-amplification will be directly proportional to the catalyticactivity of the polymerase X it encodes. (see FIGS. 1A and 1B).

By varying the selection conditions within the compartment, polymerasesor other replicases with desired properties may be selected using themethods of our invention. Thus, by exposing repertoires of polymerasegenes (diversified through targeted or random mutation) toself-amplification and by altering the conditions under whichself-amplification can occur, the system can be used for the isolationand engineering of polymerases with altered, enhanced or novelproperties. Such enhanced properties may include increasedthermostability, increased processivity, increased accuracy (betterproofreading), increased incorporation of unfavorable substrates (e.g.,ribonucleotides, dye-modified, general bases such as 5-nitroindole, orother unusual substrates such as pyrene nucleotides (Matray and Kool,1999, Nature 399, 704-708) (FIG. 3) or resistance to inhibitors (e.g.Heparin in clinical samples). Novel properties may be the incorporationof unnatural substrates (e.g. ribonucleotides), bypass reading ofdamaged sites (e.g. abasic sites (Paz-Elizur T. et al., 1997,Biochemistry 36, 1766), thymidine-dimers (Wood R. D., 1999, Nature 399,639), hydantoin-bases (Duarte V. et al., 199, Nucleic Acids Res. 27,496) and possibly even novel chemistries (e.g. novel backbones such asPNA (Nielsen P. E., 1999, Curr. Opin. Biotechnol. 10 (1), 71-5) orsulfone (Benner S. A. et al., 1998 Feb., Pure Appl. Chem. 70 (2), 263-6)or altered sugar chemistries (A. Eschenmoser, 1999, Science 284,2118-24)). It may also be used to isolate or evolve factors that enhanceor modify polymerase function such as processivity factors (likethioredoxin in the case of T7 DNA polymerase (Doublie S. et al., 1998,Nature 391, 251).

However, other enzymes besides replicases, such as telomerases,helicases etc. may also be selected according to our invention. Thus,telomerase is expressed in situ (in compartments) by for example invitro translation together with Telomerase-RNA (either added ortranscribed in situ as well; e.g. Bachand et al., 2000, RNA 6, 778-784).

Compartments also contain Taq Pol and dNTPs and telomere specificprimers. At low temperature Taq is inactive but active telomerase willappend telomeres to its own encoding gene (a linear DNA fragment withappropriate ends). After the telomerase reaction, thermocycling onlyamplifies active telomerase encoding genes. Diversity can be introducedin telomerase gene or RNA (or both) and could be targeted or random. Asapplied to selection of helicases, the selection method is essentiallythe same as described for telomerases, but helicase is used to unwindstrands rather than heat denaturation.

The methods of our invention may also be used to select for DNA repairenzymes or translesion polymerases such as E. coli Pol IV and Pol V.Here, damage is introduced into primers (targeted chemistry) or randomlyby mutagen treatment (e.g. UV, mutagenic chemicals etc.). This allowsfor selection for enzymes able to repair primers required forreplication or own gene sequence (information retrieval) or, resultingin improved “repairases” for gene therapy etc.

The methods of our invention may also be used in its various embodimentsfor selecting agents capable of directly or indirectly modulatingreplicase activity. In addition, the invention may be used to select fora pair of polypeptides capable of interacting, or for selection ofcatalytic nucleic acids such as catalytic RNA (ribozymes). These andother embodiments will be explained in further detail below.

Nucleic Acid Processing Enzymes

As referred to herein, a nucleic acid processing enzyme is any enzyme,which may be a protein enzyme or a nucleic acid enzyme, which is capableof modifying, extending (such as by at least one nucleotide), amplifyingor otherwise influencing nucleic acids such as to render the nucleicacid selectable by amplification in accordance with the presentinvention. Such enzymes therefore possess an activity which results in,for example, amplification, stabilisation, destabilisation,hybridisation or denaturation, replication, protection or deprotectionof nucleic acids, or any other activity on the basis of which a nucleicacid can be selected by amplification. Examples include helicases,telomerases, ligases, recombinases, integrases and replicases.Replicases are preferred.

Replicase/Replication

As used here, the term “replication” refers to the template-dependentcopying of a nucleic acid sequence. Nucleic acids are discussed andexemplified below. In general, the product of the replication is anothernucleic acid, whether of the same species, or of a different species.Thus, included are the replication of DNA to produce DNA, replication ofDNA to produce RNA, replication of RNA to produce DNA and replication ofRNA to produce RNA. “Replication” is therefore intended to encompassprocesses such as DNA replication, polymerisation, ligation ofoligonucleotides or polynucleotides (e.g. tri-nucleotide (triplet)5′triphosphates) to form longer sequences, transcription, reversetranscription, etc.

The term “replicase” is intended to mean an enzyme having catalyticactivity, which is capable of joining nucleotide, building blockstogether to form nucleic acid sequences. Such nucleotide building blocksinclude, but are not limited to, nucleosides, nucleoside triphosphates,deoxynucleosides, deoxynucleoside triphosphates, nucleotides (comprisinga nitrogen-containing base such as adenine, guanine, cytosine, uracil,thymine, etc., a 5-carbon sugar and one or more phosphate groups),nucleotide triphosphates, deoxynucleotides such as deoxyadenosine,deoxythymidine, deoxycytidine, deoxyuridine, deoxyguanidine,deoxynucleotides triphosphates (dNTPs), and synthetic or artificialanalogues of these. Building blocks also include oligomers or polymersof any of the above, for example, trinucleotides (triplets),oligonucleotides and polynucleotides.

Thus, a replicase may extend a pre-existing nucleic acid sequence(primer) by incorporating nucleotides or deoxynucleotides. Such anactivity is known in the art as “polymerisation”, and the enzymes, whichcarry this out, are known as “polymerases”. An example of such apolymerase replicase is DNA polymerase, which is capable of replicatingDNA. The primer may be the same chemically, or different from, theextended sequence (for example, mammalian DNA polymerase is known toextend a DNA sequence from an RNA primer). The term replicase alsoincludes those enzymes which join together nucleic acid sequences,whether polymers or oligomers to form longer nucleic acid sequences.Such an activity is exhibited by the ligases, which ligate pieces of DNAor RNA.

The replicase may consist entirely of replicase sequence, or it maycomprise a replicase sequence linked to a heterologous polypeptide orother molecule such as an agent by chemical means or in the form of afusion protein or be assembled from two or more constituent parts.

Preferably, the replicase according to the invention is a DNApolymerase, RNA polymerase, reverse transcriptase, DNA ligase, or RNAligase.

Preferably, the replicase is a thermostable replicase. A “thermostable”replicase as used here is a replicase, which demonstrates significantresistance to thermal denaturation at elevated temperatures, typicallyabove body temperature (37° C.). Preferably, such a temperature is inthe range 42° C. to 160° C., more preferably, between 60 to 100° C.,most preferably, above 90° C. Compared to a non-thermostable replicase,the thermostable replicase displays a significantly increased half-life(time of incubation at elevated temperature that results in 50% loss ofactivity). Preferably, the thermostable replicase retains 30% or more ofits activity after incubation at the elevated temperature, morepreferably, 40%, 50%, 60%, 70% or 80% or more of its activity. Yet morepreferably, the replicase retains 80% activity. Most preferably, theactivity retained is 90%, 95% or more, even 100%. Non-thermostablereplicases would exhibit little or no retention of activity aftersimilar incubations at the elevated temperature.

Polymerase

An example of a replicase is DNA polymerase. DNA polymerase enzymes arenaturally occurring intracellular enzymes, and are used by a cell toreplicate a nucleic acid strand using a template molecule to manufacturea complementary nucleic acid strand. Enzymes having DNA polymeraseactivity catalyze the formation of a bond between the 3′ hydroxyl groupat the growing end of a nucleic acid primer and the 5′ phosphate groupof a nucleotide triphosphate. These nucleotide triphosphates are usuallyselected from deoxyadenosine triphosphate (A), deoxythymidinetriphosphate (T), deoxycytidine triphosphate (C) and deoxyguanosinetriphosphate (G). However, DNA polymerases may incorporate modified oraltered versions of these nucleotides. The order in which thenucleotides are added is dictated by base pairing to a DNA templatestrand; such base pairing is accomplished through “canonical”hydrogen-bonding (hydrogen-bonding between A and T nucleotides and G andC nucleotides of opposing DNA strands), although non-canonical basepairing, such as G:U base pairing, is known in the art. See e.g., Adamset al., The Biochemistry of the Nucleic Acids 14-32 (11th ed. 1992). Thein-vitro use of enzymes having DNA polymerase activity has in recentyears become more common in a variety of biochemical applicationsincluding cDNA synthesis and DNA sequencing reactions (see Sambrook etal., (2nd ed. Cold Spring Harbor Laboratory Press, 1989) herebyincorporated by reference herein), and amplification of nucleic acids bymethods such as the polymerase chain reaction (PCR) (Mullis et al., U.S.Pat. Nos. 4,683,195, 4,683,202, and 4,800,159, hereby incorporated byreference herein) and RNA transcription-mediated amplification methods(e.a., Kacian et al., PCT Publication No. WO91/01384).

Methods such as PCR make use of cycles of primer extension through theuse of a DNA polymerase activity, followed by thermal denaturation ofthe resulting double-stranded nucleic acid in order to provide a newtemplate for another round of primer annealing and extension. Becausethe high temperatures necessary for strand denaturation result in theirreversible inactivations of many DNA polymerases, the discovery anduse of DNA polymerases able to remain active at temperatures above about37° C. to 42° C. (thermostable DNA polymerase enzymes) provides anadvantage in cost and labor efficiency. Thermostable DNA polymeraseshave been discovered in a number of thermophilic organisms including,but not limited to Thermus aquaticus, Thermus thermophilus, and speciesof the Bacillus, Thermococcus, Sulfolobus, Pyrococcus genera. DNApolymerases can be purified directly from these thermophilic organisms.However, substantial increases in the yield of DNA polymerase can beobtained by first cloning the gene encoding the enzyme in a multicopyexpression vector by recombinant DNA technology methods, inserting thevector into a host cell strain capable of expressing the enzyme,culturing the vector-containing host cells, then extracting the DNApolymerase from a host cell strain which has expressed the enzyme.

The bacterial DNA polymerases that have been characterized to date havecertain patterns of similarities and differences which has led some todivide these enzymes into two groups: those whose genes containintrons/inteins (Class B DNA polymerases), and those whose DNApolymerase genes are roughly similar to that of E. coli DNA polymerase Iand do not contain introns (Class A DNA polymerases).

Several Class A and Class B thermostable DNA polymerases derived fromthermophilic organisms have been cloned and expressed. Among the class Aenzymes: Lawyer et al., 1989, J. Biol. Chem. 264, 6427-6437, and Gelfundet al., U.S. Pat. No. 5,079,352, report the cloning and expression of afull length thermostable DNA polymerase derived from Thermus aquaticus(Taq). Lawyer et al., 1993, PCR Methods and Applications 2, 275-287, andBarnes, PCT Publication No. WO92/06188 (1992), disclose the cloning andexpression of truncated versions of the same DNA polymerase, whileSullivan, EPO Publication No. 0482714A1 (1992), reports cloning amutated version of the Taq DNA polymerase. Asakura et al., 1993, J.Ferment. Bioeng. (Japan) 74, 265-269, have reportedly cloned andexpressed a DNA polymerase from Thermus thermophilus. Gelfund et al.,PCT Publication No. WO92/06202 (1992), have disclosed a purifiedthermostable DNA polymerase from Thermosipho africanus. A thermostableDNA polymerase from Thermus flavus is reported by Akhmetzjanov andVakhitov, 1992, Nucleic Acids Res. 20, 5839. Uemori et al., 1993, J.Biochem. 113, 401-410 and EPO Publication No. 0517418A2 (1992) havereported cloning and expressing a DNA polymerase from the thermophilicbacterium Bacillus caldotenax. Ishino et al., Japanese PatentApplication No. HEI 4[1992]-1 31400 (publication date Nov. 19, 1993)report cloning a DNA polymerase from Bacillus stearothermophilus. Amongthe Class B enzymes: A recombinant thermostable DNA polymerase fromThermococcus litoralis is reported by Comb et al., EPO Publication No. 0455 430 A3 (1991), Comb et al., EPO Publication No. 0547920A2 (1993),and Perler et al., 1992, Proc. Natl. Acad. Sci. USA 89, 5577-5581. Acloned thermostable DNA polymerase from Sulfolobus solofatarius isdisclosed in Pisani et al., 1992, Nucleic Acids Res. 20, 2711-2716 andin PCT Publication WO93/25691 (1993). The thermostable enzyme ofPyrococcus furiosus is disclosed in Uemori et al., 1993, Nucleic AcidsRes. 21, 259-265, while a recombinant DNA polymerase is derived fromPyrococcus sp. as disclosed in Comb et al., EPO Publication No.0547359A1 (1993).

Many thermostable DNA polymerases possess activities additional to a DNApolymerase activity; these may include a 5′-3′ exonuclease activityand/or a 3′-5′ exonuclease activity. The activities of 5′-3′ and 3′-5′exonucleases are well known to those of ordinary skill in the art. The3′-5′ exonuclease activity improves the accuracy of thenewly-synthesized strand by removing incorrect bases that may have beenincorporated; DNA polymerases in which such activity is low or absent,reportedly including Taq DNA polymerase (see Lawyer et al., J. Biol.Chem. 264, 6427-6437), have elevated error rates in the incorporation ofnucleotide residues into the primer extension strand. In applicationssuch as nucleic acid amplification procedures in which the replicationof DNA is often geometric in relation to the number of primer extensioncycles, such errors can lead to serious artifactual problems such assequence heterogeneity of the nucleic acid amplification product(amplicon). Thus, a 3′-5′ exonuclease activity is a desiredcharacteristic of a thermostable DNA polymerase used for such purposes.

By contrast, the 5′-3′ exonuclease activity often present in DNApolymerase enzymes is often undesired in a particular application sinceit may digest nucleic acids, including primers, that have an unprotected5′ end. Thus, a thermostable DNA polymerase with an attenuated 5′-3′exonuclease activity, or in which such activity is absent, is also adesired characteristic of an enzyme for biochemical applications.Various DNA polymerase enzymes have been described where a modificationhas been introduced in a DNA polymerase, which accomplishes this object.For example, the Klenow fragment of E. coli DNA polymerase I can beproduced as a proteolytic fragment of the holoenzyme in which the domainof the protein controlling the 5′-3′ exonuclease activity has beenremoved. The Klenow fragment still retains the polymerase activity andthe 3′-5′ exonuclease activity. Barnes, supra, and Gelfund et al., U.S.Pat. No. 5,079,352 have produced 5′-3′ exonuclease-deficient recombinantTaq DNA polymerases. Ishino et al., EPO Publication No. 0517418A2, haveproduced a 5′-3′ exonuclease-deficient DNA polymerase derived fromBacillus caldotenax. On the other hand, polymerases lacking the 5′-3′exonuclease domain often have reduced processivity.

Ligase

DNA strand breaks and gaps are generated transiently during replication,repair and recombination. In mammalian cell nuclei, rejoining of suchstrand breaks depends on several different DNA polymerases and DNAligase enzymes. The mechanism for joining of DNA strand interruptions byDNA ligase enzymes has been widely described. The reaction is initiatedby the formation of a covalent enzyme-adenylate complex. Mammalian andviral DNA ligase enzymes employ ATP as cofactor, whereas bacterial DNAligase enzymes use NAD to generate the adenylyl group. In the case ofATP-utilising ligases, the ATP is cleaved to AMP and pyrophosphate withthe adenylyl residue linked by a phosphoramidate bond to the ε-aminogroup of a specific lysine residue at the active site of the protein(Gumport, R. I. et al., 1971, PNAS 68, 2559-63). Reactivated AMP residueof the DNA ligase-adenylate intermediate is transferred to the 5′phosphate terminus of a single strand break in double stranded DNA togenerate a covalent DNA-AMP complex with a 5′-5′ phosphoanhydride bond.This reaction intermediate has also been isolated for microbial andmammalian DNA ligase enzymes, but is shorter lived than the adenylylatedenzyme. In the final step of DNA ligation, unadenylylated DNA ligaseenzymes required for the generation of a phosphodiester bond catalyzedisplacement of the AMP residue through attack by the adjacent3′-hydroxyl group on the adenylylated site.

The occurrence of three different DNA ligase enzymes, DNA Ligase I, IIand III, is established previously by biochemical and immunologicalcharacterization of purified enzymes (Tonikinson, A. E. et al., 1991, J.Biol. Chem. 266, 21728-21735, and Roberts, E. et al., 1994, J. Biol.Chem. 269, 3789-3792).

Amplification

The methods of our invention involve the templated amplification ofdesired nucleic acids. “Amplification” refers to the increase in thenumber of copies of a particular nucleic acid fragment (or a portion ofthis) resulting either from an enzymatic chain reaction (such as apolymerase chain reaction, a ligase chain reaction, or a self-sustainedsequence replication) or from the replication of all or part of thevector into which it has been cloned. Preferably, the amplificationaccording to our invention is an exponential amplification, as exhibitedby for example the polymerase chain reaction.

Many target and signal amplification methods have been described in theliterature, for example, general reviews of these methods in Landegren,U. et al., 1988, Science 242, 229-237, and Lewis, R., 1990, GeneticEngineering News 10:1, 54-55. These amplification methods may be used inthe methods of our invention, and include polymerase chain reaction(PCR), PCR in situ, ligase amplification reaction (LAR), ligasehybridization, Q bacteriophage replicase, transcription-basedamplification system (TAS), genomic amplification with transcriptsequencing (GAWTS), nucleic acid sequence-based amplification (NASBA)and in situ hybridization.

Polymerase Chain Reaction (PCR)

PCR is a nucleic acid amplification method described inter alia in U.S.Pat. Nos. 4,683,195 and 4,683,202. PCR consists of repeated cycles ofDNA polymerase generated primer extension reactions. The target DNA isheat denatured and two oligonucleotides, which bracket the targetsequence on opposite strands of the DNA to be amplified, are hybridized.These oligonucleotides become primers for use with DNA polymerase. TheDNA is copied by primer extension to make a second copy of both strands.By repeating the cycle of heat denaturation, primer hybridization andextension, the target DNA can be amplified a million fold or more inabout two to four hours. PCR is a molecular biology tool, which must beused in conjunction with a detection technique to determine the resultsof amplification. An advantage of PCR is that it increases sensitivityby amplifying the amount of target DNA by 1 million to 1 billion fold inapproximately 4 hours.

The polymerase chain reaction may be used in the selection methods ofour invention as follows. For example, PCR may be used to select forvariants of Taq polymerase having polymerase activity. As described infurther detail above, a library of nucleic acids each encoding areplicase or a variant of the replicase, for example, Taq polymerase, isgenerated and subdivided into compartments. Each compartment comprisessubstantially one member of the library together with the replicase orvariant encoded by that member.

The polymerase or variant may be expressed in vivo within a transformedbacterium or any other suitable expression host, for example yeast orinsect or mammalian cells, and the expression host encapsulated within acompartment. Heat or other suitable means is applied to disrupt the hostand to release the polymerase variant and its encoding nucleic acidwithin the compartment. In the case of a bacterial host, timedexpression of a lytic protein, for example protein E from ΦX174, or useof an inducible λ lysogen, may be employed for disrupting the bacterium.

It will be clear that the polymerase or other enzyme need not be aheterologous protein expressed in that host (e.g., a plasmid), but maybeexpressed from a gene forming part of the host genome. Thus, thepolymerase may be for example an endogenous or native bacterialpolymerase. We have shown that in the case of nucleotide diphosphatekinase (ndk), endogenous (uninduced) expression of ndk is sufficient togenerate dNTPs for its own replication. Thus, the methods of selectionaccording to our invention may be employed for the direct functionalcloning of polymerases and other enzymes from diverse (and uncultured)microbial populations.

Alternatively, the nucleic acid library may be compartmentalisedtogether with components of an in vitro transcription/translation system(as described in further detail in this document), and the polymerasevariant expressed in vitro within the compartment.

Each compartment also comprises components for a PCR reaction, forexample, nucleotide triphosphates (dNTPs), buffer, magnesium, andoligonucleotide primers. The oligonucleotide primers may have sequencescorresponding to sequences flanking the polymerase gene (i.e., withinthe genomic or vector DNA) or to sequences within the polymerase gene.PCR thermal cycling is then initiated to allow any polymerase varianthaving polymerase activity to amplify the nucleic acid sequence.

Active polymerases will amplify their corresponding nucleic acidsequences, while nucleic acid sequences encoding weakly active orinactive polymerases will be weakly replicated or not be replicated atall. In general, the final copy number of each member of the nucleicacid library will be expected to be proportional to the level ofactivity of the polymerase variant encoded by it. Nucleic acids encodingactive polymerases will be over-represented, and nucleic acids encodinginactive or weakly active polymerases will be under-represented. Theresulting amplified sequences may then be cloned and sequenced, etc.,and replication ability of each member assayed.

As described in further detail elsewhere, the conditions within eachcompartment may be altered to select for polymerases active under theseconditions. For example, heparin may be added to the reaction mix tochoose polymerases, which are resistant to heparin. The temperature atwhich PCR takes place may be elevated to select for heat resistantvariants of polymerase. Furthermore, polymerases may be selected whichare capable of extending DNA sequences such as primers with altered 3′ends or altered parts of the primer sequence. The altered 3′ ends orother alterations can include unnatural bases (altered sugar or basemoieties), modified bases (e.g. blocked 3′ ends) or even primers withaltered backbone chemistries (e.g. PNA primers).

Reverse Transcriptase-PCR

RT-PCR is used to amplify RNA targets. In this process, the reversetranscriptase enzyme is used to convert RNA to complementary DNA (cDNA),which can then be amplified using PCR. This method has proven useful forthe detection of RNA viruses.

The methods of our invention may employ RT-PCR. Thus, the pool ofnucleic acids encoding the replicase or its variants may be provided inthe form of an RNA library. This library could be generated in vivo inbacteria, mammalian cells, yeast etc., which are compartmentalised, orby in-vitro transcription of compartmentalised DNA. The RNA could encodea co-compartmentalised replicase (e.g. reverse transcriptase orpolymerase) that has been expressed in vivo (and released in emulsionalong with the RNA by means disclosed below) or in vitro. Othercomponents necessary for amplification (polymerase and/or reversetranscriptase, dNTPs, primers) are also compartmentalised. Under givenselection pressure(s), the cDNA product of the reverse transcriptionreaction serves as a template for PCR amplification. As with otherreplication reactions (in particular ndk in the Examples) the RNA mayencode a range of enzymes feeding the reaction.

Self-Sustained Sequence Replication (3SR)

Self-sustained sequence replication (3SR) is a variation of TAS, whichinvolves the isothermal amplification of a nucleic acid template viasequential rounds of reverse transcriptase (RT), polymerase and nucleaseactivities that are mediated by an enzyme cocktail and appropriateoligonucleotide primers (Guatelli et al., 1990, Proc. Natl. Acad. Sci.USA 87, 1874). Enzymatic degradation of the RNA of the RNA/DNAheteroduplex is used instead of heat denaturation. RNAse H and all otherenzymes are added to the reaction and all steps occur at the sametemperature and without further reagent additions. Following thisprocess, amplifications of 10⁶ to 10⁹ have been achieved in one hour at42° C.

The methods of our invention may therefore be extended to selectpolymerases or replicases from mesophilic organisms using 3SR isothermalamplification (Guatelli et al., 1990, Proc. Natl. Acad. Sci. USA 87,7797; Compton, 1991, Nature 7:350, 91-92) instead of PCR thermocycling.As described above, 3SR involves the concerted action of two enzymes: anRNA polymerases as well as a reverse transcriptase cooperate in acoupled reaction of transcription and reverse transcription, leading tothe simultaneous amplification of both RNA and DNA. Clearly, in thissystem self-amplification may be applied to either of the two enzymesinvolved or to both simultaneously. It may also include the evolution ofthe RNAse H activity either as part of the reverse transcriptase enzyme(e.g. HIV-1 RT) or on its own.

The various enzymatic activities that define 3SR and related methods areall targets for selection using the methods of our invention. Variantsof either T7 RNA polymerase, reverse transcriptase (RT), or RNAse H canbe provided within the aqueous compartments of the emulsions, andselected for under otherwise limiting conditions. These variants can beintroduced via E. coli “gene pellets” (i.e., bacteria express thepolypeptide), or other means as described else where in this document.Initial release in emulsion may be mediated by enzymatic (for example,lambda lysogen) or thermal lysis, or other methods as disclosed here.The latter may necessitate the use of agents that stabilize enzymaticactivity at transiently elevated temperatures. For example, it may benecessary to include amounts of proline, glycerol, trehalose or otherstabilising agents as known in the art to effect stabilisation ofthermosensitive enzymes such as reverse transcriptase. Furthermore,stepwise removal of the agent may be undertaken to select for increasedstability of the thermosensitive enzyme.

Alternatively, and as disclosed elsewhere, variants may be produced viacoupled transcription translation, with the expressed products feedinginto the 3SR cycle.

It will also be appreciated that it is possible to replace reversetranscriptase with the thermostable Tth DNA polymerase. Tth DNApolymerase is known to have reverse transcriptase activity and the RNAtemplate is effectively reverse-transcribed into template DNA using thisenzyme. It is therefore possible to select for useful variants of thisenzyme, by for example, introducing bacterially expressed T7 RNApolymerase variants into emulsion and preincubation at an otherwisenon-permissive temperature.

Example 18 below is an example showing one way in which the methods ofour invention may be applied to selection of replicases usingself-sustained sequence replication (3SR).

Ligation Amplification (LAR/LAS)

Ligation amplification reaction or ligation amplification system usesDNA ligase and four oligonucleotides, two per target strand. Thistechnique is described by Wu, D. Y. and Wallace, R. B., 1989, Genomics4, 560. The oligonucleotides hybridize to adjacent sequences on thetarget DNA and are joined by the ligase. The reaction is heat denaturedand the cycle repeated.

By analogy to the application to polymerases, our method may be appliedto ligases in particular from thermophilic organisms. Oligonucleotidescomplementary to one strand of the ligase gene sequence are synthesized(either as perfect match or comprising targeted or random diversity).The two end oligos overlap into the vector or untranslated regions ofthe ligase gene. The ligase gene is either cloned for expression in anappropriate host and compartmentalized together with theoligonucleotides and an appropriate energy source (usually ATP (orNADPH)). If necessary, the ligase expressed as above in bacteria isreleased from the cells by thermal lysis. Compartments containappropriate buffer together with appropriate amounts of an appropriateenergy source (ATP or NADH) and oligonucleotides encoding the whole ofthe ligase gene as well as flanking sequences required for cloning.Ligation of oligonucleotides leads to assembly of a full-length ligasegene (templated by the ligase gene on the expression plasmid) by anactive ligase. In compartments containing an inactive ligase, noassembly will take place. As with polymerases, the copy number of aligase gene X after self-ligation will preferably be proportional to thecatalytic activity under the selection conditions of the ligase X itencodes.

After lysis of the cell, thermocycling leads to annealing of theoligonucleotides to the ligase gene. However, ligation of the oligos andthus assembly of the full-length ligase gene depends on the presence ofan active ligase in the same compartment. Thus only genes encodingactive ligases will assemble their own encoding genes from the presentoligonucleotides. Assembled genes can then be amplified, diversified andrecloned for another round of selection if necessary. The methods of ourinvention are therefore suitable for the selection of ligases, which arefaster or more efficient at ligation.

As noted elsewhere, the ligase can be produced either in situ byexpression from a suitable bacterial or other host, or by in vitrotranslation. The ligase may be an oligonucleotide (e.g. ribo ordeoxiribozyme) ligase assembling its own sequence from availablefragments, or the ligase may be a conventional (polypeptide) ligase. Thelength of the oligonucleotides will depend on the particular reaction,but if necessary, they can be very short (e.g. triplets). As notedelsewhere, the method of our invention may be used to select for anagent capable of modulating ligase activity, either directly orindirectly. For example, the gene to be evolved may be another enzyme orenzymes that generates a substrate for the ligase (e.g. NADH) orconsumes an inhibitor. In this case the oligonucleotides encode parts ofthe other enzyme or enzymes etc.

The ligation reaction between oligonucleotides may incorporatealternative chemistries e.g. amide linkages. As long as the chemicallinkages do not interfere with templated copying of the opposite strandby any replicase (e.g. reverse transcriptase), a wide variety of linkagechemistries and ligases that catalyse it may be evolved.

Qβ Replicase

In this technique, RNA replicase for the bacteriophage Qβ, whichreplicates single-stranded RNA, is used to amplify the target DNA, asdescribed by Lizardi et al., 1988, Bio. Technology 6, 1197. First, thetarget DNA is hybridized to a primer including a T7 promoter and a Qβ 5′sequence region. Using this primer, reverse transcriptase generates acDNA connecting the primer to its 5′ end in the process. These two stepsare similar to the TAS protocol. The resulting heteroduplex is heatdenatured. Next, a second primer containing a Qβ 3′ sequence region isused to initiate a second round of cDNA synthesis. This results in adouble stranded DNA containing both 5′ and 3′ ends of the Qβbacteriophage as well as an active T7 RNA polymerase binding site. T7RNA polymerase then transcribes the double-stranded DNA into new RNA,which mimics the Qβ. After extensive washing to remove any unhybridizedprobe, the new RNA is eluted from the target and replicated by Qβreplicase. The latter reaction creates 10⁷ fold amplification inapproximately 20 minutes. Significant background may be formed due tominute amounts of probe RNA that is non-specifically retained during thereaction.

A reaction employing Qβ replicase as described above may be used tobuild a continuous selection reaction in an alternative embodimentaccording to our invention.

For example, the gene for Qβ replicase (with appropriate 5′ and 3′regions) is added to an in vitro translation reaction andcompartmentalised. In compartments, the replicase is expressed andimmediately starts to replicate its own gene. Only genes encoding anactive replicase replicate themselves. Replication proceeds until NTPsare exhausted. However, as NTPs can be made to diffuse through theemulsion (see the description of ndk in the Examples), the replicationreaction may be “fed” from the outside and proceed much longer,essentially until there is no room left within the compartments forfurther replication. It is possible to propagate the reaction further byserial dilution of the emulsion mix into a fresh oil-phase andre-emulsification after addition of a fresh water-phase containing NIPs.Qβ replicase is known to be very error-prone, so replication alone willintroduce lots of random diversity (which may be desirable). The methodsdescribed here allow the evolution of more specific (e.g. primerdependent) forms of Qβ-replicase. As with other replication reactions(in particular ndk in the Examples) a range of enzymes feeding thereaction may be evolved.

Other Amplification Techniques

Alternative amplification technology may be exploited in the presentinvention. For example, rolling circle amplification (Lizardi et al.,1998, Nat. Genet. 19, 225) is an amplification technology availablecommercially (RCAT™) which is driven by DNA polymerase and can replicatecircular oligonucleotide probes with either linear or geometric kineticsunder isothermal conditions.

In the presence of two suitably designed primers, a geometricamplification occurs via DNA strand displacement and hyperbranching togenerate 10¹² or more copies of each circle in 1 hour.

If a single primer is used, RCAT generates in a few minutes a linearchain of thousands of tandemly linked DNA copies of a target covalentlylinked to that target.

A further technique, strand displacement amplification (SDA; Walker etal., 1992 PNAS (USA) 80, 392) begins with a specifically definedsequence unique to a specific target. But unlike other techniques whichrely on thermal cycling, SDA is an isothermal process that utilizes aseries of primers, DNA polymerase and a restriction enzyme toexponentially amplify the unique nucleic acid sequence.

SDA comprises both a target generation phase and an exponentialamplification phase.

In target generation, double-stranded DNA is heat denatured creating twosingle-stranded copies. A series of specially manufactured primerscombine with DNA polymerase (amplification primers for copying the basesequence and bumper primers for displacing the newly created strands) toform altered targets capable of exponential amplification.

The exponential amplification process begins with altered targets(single-stranded partial DNA strands with restricted enzyme recognitionsites) from the target generation phase.

An amplification primer is bound to each strand at its complimentary DNAsequence. DNA polymerase then uses the primer to identify a location toextend the primer from its 3′ end, using the altered target as atemplate for adding individual nucleotides. The extended primer thusforms a double-stranded DNA segment containing a complete restrictionenzyme recognition site at each end.

A restriction enzyme is then bound to the double stranded DNA segment atits recognition site. The restriction enzyme dissociates from therecognition site after having cleaved only one strand of thedouble-sided segment, forming a nick. DNA polymerase recognizes the nickand extends the strand from the site, displacing the previously createdstrand. The recognition site is thus repeatedly nicked and restored bythe restriction enzyme and DNA polymerase with continuous displacementof DNA strands containing the target segment.

Each displaced strand is then available to anneal with amplificationprimers as above. The process continues with repeated nicking, extensionand displacement of new DNA strands, resulting in exponentialamplification of the original DNA target.

Selection of Catalytic RNA

Known methods of in-vitro evolution have been used to generatecatalytically active RNA molecules (ribozymes) with a diverse range ofactivities. However, these have involved selection by self-modification,which inherently isolates variants that rely on proximity catalysis andwhich display reduced activities in trans.

Compartmentalisation affords a means to select for truly trans-actingribozymes capable of multiple turnover, without the need to tethersubstrate to the ribozyme by covalent linkage or hydrogen-bonding (i.e.,base-pairing) interactions.

In its simplest case, a gene encoding a ribozyme can be introduced intoemulsion and readily transcribed as demonstrated by the transcriptionand the 3SR amplification of the RNA encoding Taq polymerase in situ asfollows: The Taq polymerase gene is first transcribed in emulsion. 100μl of a reaction mix comprising 80 mM HEPES-KOH (pH 7.5), 24 mM MgCl₂, 2mM spermidine, 40 mM DTT, rNTPs (30 mM), 50 ng T7-Taq template (seeExample 18. Selection Using Self-Sustained Sequence Replication (3SR)),60 units T7 RNA polymerase (USB), 40 units RNAsin (Promega) isemulsified using the standard protocol. Emulsions are incubated at 37°C. for up to 6 hours and analysis of reaction products by gelelectrophoresis showed levels of RNA production to be comparable tothose of the non-emulsified control.

By creating a 5′ overhang (e.g. by ligation of either DNA or RNAadaptors) in the emulsified gene, RNA variants are selected for with theability of carrying out the template directed addition of successivedNTPs in trans (i.e. polymerase activity, see FIG. 6). Genes that havebeen “filled-in” may be rescued by PCR using primers complimentary tothe single-stranded region of the gene (i.e., the region, which issingle stranded prior to ribozyme fill-in) or by capture of biotin (orotherwise) modified nucleotides that are incorporated followed by PCR.In compartments without catalytic RNA activity, this region remainssingle stranded, and PCR will fail to amplify the template(alternatively no nucleotides are incorporated and the template is notcaptured but washed away).

A coupling approach can also be used to further extend the range ofenzymatic activities that could be selected for. For example,co-emulsification of a DNA polymerase with the gene described above (5′overhang) can be used to select for ribozymes that convert an otherwiseunsuitable NTP substrate into one that can be utilised by thepolymerase. As before, the “filled-in” gene can then be rescued by PCR.The above approach can also be used to select for protein polymeraseenzyme produced in-situ from a similar template (i.e. with 3′ overhang).A diagram showing the selection of RNA having catalytic activity isshown as FIG. 6.

Selection of Agents Capable of Modifying Replicase Activity

In another embodiment, our invention is used to select for an agentcapable of modifying the activity of a replicase. In this embodiment, apool of nucleic acids is generated comprising members encoding one ormore candidate agents. Members of the nucleic acid library arecompartmentalised together with a replicase (which, as explained above,is able only to act on the nucleic acid encoding the agent).

The candidate agents may be functionally or chemically distinct fromeach other, or they may be variants of an agent known or suspected to becapable of modulating replicase activity. Members of the pool are thensegregated into compartments together with the polypeptides orpolynucleotides encoded by them, so that preferably each compartmentcomprises a single member of the pool together with its cognate encodedpolypeptide. Each compartment also comprises one or more molecules ofthe replicase. Thus, the encoded polypeptide agent is able to modulatethe activity of the replicase, to prevent or enhance replication of thecompartmentalised nucleic acid (i.e., the nucleic acid encoding theagent). In this way, the polypeptide agent is able to act via thereplicase to increase or decrease the number of molecules of itsencoding nucleic acid. In a highly preferred embodiment of theinvention, the agent is capable of enhancing replicase activity, toenable detection or selection of the agent by detecting the encodingnucleic acid.

The modulating agent may act directly or indirectly on the replicase.For example, the modulating agent may be an enzyme comprising anactivity, which acts on the replicase molecule, for example, by apost-translational modification of replicase, to activate or inactivatethe replicase. The agent may act by taking off or putting on a ligandfrom the replicase molecule. It is known that many replicases such aspolymerases and ligases are regulated by phosphorylation, so that inpreferred embodiments the agent according to the invention is a kinaseor a phosphorylase. The modulating agent may also directly interact withthe replicase and modify its properties (e.g. Thioredoxin & T7-DNApolymerase, members of the replisome e.g. clamp, helicase etc. with DNApolymerase III).

Alternatively, the modulating agent may exert its effects on thereplicase in an indirect manner. For example, modulation of replicaseactivity may take place via a third body, which third body is modifiedby the modulating agent, for example as described above.

Furthermore, the modulating agent may be an enzyme, which forms part ofa pathway, which produces as an end product a substrate for thereplicase. In this embodiment, the modulating agent is involved in thesynthesis of an intermediate (or the end product) of the pathway.Accordingly, the rate of replication (and hence the amount of nucleicacid encoding the agent) is dependent on the activity of the modulatingagent.

For example, the modulating agent may be a kinase that is involved inthe biosynthesis of bases, deoxyribonucleosides, deoxyribonucleotidessuch as dAMP, dCMP, dGMP and dTMP, deoxyribonucleoside diphosphates(such as dADP, dCDP, dCTP and dTDP), deoxyribonucleoside triphosphatessuch as dATP, dCTP, dGTP or dTTP, or nucleosides, nucleotides such asAMP, CMP, GMP and UMP, nucleoside diphosphates (such as ADP, CDP, CTPand UDP), nucleoside triphosphates such as ATP, CTP, GTP or UTP, etc.The modulating agent may be involved in the synthesis of otherintermediates in the biosynthesis of nucleotides (as described and wellknown from biochemical textbooks such as Stryer or Lehninger), such asIMP, 5-phospho-α-D-ribose-1-pyrophosphoric acid,5-phospho-β-D-ribossylamine, 5-phosphoribosyl-glycinamide,5-phosphoribosyl-N-formylglycinamide, etc. Thus, the agent may comprisean enzyme such as ribosephosphate pyrophosphokinase,phosphoribosylglycinamide synthetase, etc. Other examples of such agentswill be apparent to those skilled in the art. The methods of ourinvention allow the selection of such agents with improved catalyticactivity.

In yet another embodiment, the modulator functions to “unblock” aconstituent of the replication cocktail (primers, dNTP, replicase etc.).An example of a blocked constituent would be a primer or dNTP with achemical moiety attached that inhibits the replicase used in the CSRcycle. Alternatively, the pair of primers used could be covalentlytethered by a linking agent, with cleavage of the agent by the modulatorallowing both primers to amplify its gene in the presence ofsupplemented replicase. An example of a linking agent would be a peptidenucleic acid (PNA). Additionally, by designing a large oligonucleotidethat encodes a pair of primer sequences interspersed by targetnucleotide sequence, novel site-specific restriction enzymes could beevolved. As before, the rate of replication (and hence the amount ofnucleic acid encoding the agent) is dependent on the activity of themodulating agent. Alternatively the modulator can modify the 5′ end aprimer such that amplification products incorporating the primer can becaptured by a suitable agent (e.g. antibody) and thus enriched andreamplified.

In a further embodiment, the scope of CSR may be further broadened toselect for agents that are not necessarily thermostable. Deliveryvehicles (e.g. E. coli) containing expression constructs that encode asecretable form of a modulator/replicase of interest arecompartmentalised. Inclusion of an inducing agent in the aqueous phaseand incubation at permissive temperature (e.g. 37° C.) allows forexpression and secretion of the modulator/replicase into thecompartment. Sufficient time is then allowed for the modulator to act inany of the aforementioned ways to facilitate subsequent amplification ofthe gene encoding it (e.g. consume an inhibitor of replication). Theensuing temperature change during the amplification process serves torid the compartment of host cell enzymatic activities (that have up tothis point been segregated from the aqueous phase) and release theencoding gene for amplification.

Thus, according to an embodiment of our invention, we provide a methodof selecting a polypeptide involved in a pathway which has as an endproduct a substrate which is involved in a replication reaction (“apathway polypeptide”), the method comprising the steps of: (a) providinga replicase; (b) providing a pool of nucleic acids comprising memberseach encoding a pathway polypeptide or a variant of the pathwaypolypeptide; (c) subdividing the pool of nucleic acids intocompartments, such that each compartment comprises a nucleic acid memberof the pool, the pathway polypeptide or variant encoded by the nucleicacid member, the replicase, and other components of the pathway; and (d)detecting amplification of the nucleic acid member by the replicase.

The Examples (in particular Example 19 and following Examples) show theuse of our invention in the selection of nucleoside diphosphate kinase(NDP Kinase), which catalyses the transfer of a phosphate group from ATPto a deoxynucleoside diphosphate to produce a deoxynucleosidetriphosphate.

In yet another embodiment, the modulating agent is such that it consumesan inhibitor of replicase activity. For example, it is known thatheparin is an inhibitor of replicase (polymerase) activity. Our methodallows the selection of a heparinase with enhanced activity, bycompartmentalisation of a library of nucleic acids encoding heparinaseor variants of this enzyme, in the presence of heparin and polymerase.Heparinase variants with enhanced activity are able to break downheparin to a greater extent or more rapidly, thus removing theinhibition of replicase activity within the compartment and allowing thereplication of the nucleic acid within the compartment (i.e., thenucleic acid encoding that heparinase variant).

Selection of Interacting Polypeptides

The most important systems for the selection of protein-proteininteractions are in vivo methods, with the most important and bestdeveloped being the yeast two-hybrid system (Fields and Song, 1989,Nature 340, 245-246). In this system and related approaches two hybridproteins are generated: a bait-hybrid comprising protein X fused to aDNA-binding domain and a prey-hybrid comprising protein Y fused to atranscription activation domain with cognate interaction of X and Yreconstituting the transcriptional activator. Two other in vivo systemshave been put forward in which the polypeptide chain of an enzyme isexpressed in two parts fused to two proteins X and Y and in whichcognate X-Y interaction reconstitutes function of the enzyme (Karimova,1998, Proc. Natl. Acad. Sci. USA 95, 5752-6; Pelletier, 1999, Nat.Biotechnol. 17, 683-690) conferring a selectable phenotype on the cell.

It has recently been shown that Taq polymerase can be split in a similarway (Vainshtein et al., 1996, Protein Science 5, 1785). According to ourinvention, therefore, we provide a method of selecting a pair ofpolypeptides capable of stable interaction by splitting Taq polymeraseor any enzyme or factor auxiliary to the polymerase reaction.

The method comprises several steps. The first step consists of providinga first nucleic acid and a second nucleic acid. The first nucleic acidencodes a first fusion protein comprising a first subdomain of areplicase (or other see above) enzyme fused to a first polypeptide,while the second nucleic acid encodes a second fusion protein comprisinga second subdomain of a replicase (or other see above) enzyme fused to asecond polypeptide. The two fusion proteins are such that stableinteraction of the first and second replicase (or other see above)subdomains generates replicase activity (either directly or indirectly).At least one of the first and second nucleic acids (preferably both) isprovided in the form of a pool of nucleic acids encoding variants of therespective first and/or second polypeptide(s).

The pool or pools of nucleic acids are then subdivided intocompartments, such that each compartment comprises a first nucleic acidand a second nucleic acid together with respective fusion proteinsencoded by the first and second nucleic acids. The first polypeptide isthen allowed to bind to the second polypeptide, such that binding of thefirst and second polypeptides leads to stable interaction of thereplicase subdomains to generate replicase activity. Finally,amplification of at least one of the first and second nucleic acids bythe replicase is detected.

Our invention therefore encompasses an in vitro selection system wherebyreconstitution of replicase function through the cognate association oftwo polypeptide ligands drives amplification and linkage of the genes ofthe two ligands. Such an in vitro two-hybrid system is particularlysuited for the investigation of protein-protein interactions at hightemperatures, e.g. for the investigation of the protenomes ofthermophilic organisms or the engineering of highly stable interactions.

The system can also be applied to the screening and isolation ofmolecular compounds that promote cognate interactions. For example,compounds can be chemically linked to either primers or dNTPs and thuswould only be incorporated into amplicons if promoting association. Inorder to prevent cross-over, such compounds would have to be releasedonly after compartmentalisation has taken place, e.g. by coupling tomicrobeads or by inclusion into dissolvable microspheres.

Single Step and Multiple Step Selections

The selection of suitable encapsulation conditions is desirable.Depending on the complexity and size of the library to be screened, itmay be beneficial to set up the encapsulation procedure such that 1 orless than 1 nucleic acids is encapsulated per microcapsule orcompartment. This will provide the greatest power of resolution. Wherethe library is larger and/or more complex, however, this may beimpracticable; it may be preferable to encapsulate or compartmentaliseseveral nucleic acids together and rely on repeated application of themethod of the invention to achieve sorting of the desired activity. Acombination of encapsulation procedures may be used to obtain thedesired enrichment.

Theoretical studies indicate that the larger the number of nucleic acidsvariants created the more likely it is that a molecule will be createdwith the properties desired (see Perelson and Oster, 1979, J. Theor.Biol. 81, 64570 for a description of how this applies to repertoires ofantibodies). Recently it has also been confirmed practically that largerphage-antibody repertoires do indeed give rise to more antibodies withbetter binding affinities than smaller repertoires (Griffiths et al.,1994, Embo. J. 13, 3245-60). To ensure that rare variants are generatedand thus are capable of being selected, a large library size isdesirable. Thus, the use of optimally small microcapsules is beneficial.

In addition to the nucleic acids described above, the microcapsules orcompartments according to the invention may comprise further componentsrequired for the replication reaction to take place. Other components ofthe system may for example comprise those necessary for transcriptionand/or translation of the nucleic acid. These are selected for therequirements of a specific system from the following: a suitable buffer,an in vitro transcription/replication system and/or an in vitrotranslation system containing all the necessary ingredients, enzymes andcofactors, RNA polymerase, nucleotides, nucleic acids (natural orsynthetic), transfer RNAs, ribosomes and amino acids, and the substratesof the reaction of interest in order to allow selection of the modifiedgene product.

Buffer

A suitable buffer will be one in which all of the desired components ofthe biological system are active and will therefore depend upon therequirements of each specific reaction system. Buffers suitable forbiological and/or chemical reactions are known in the art and recipesprovided in various laboratory texts (Sambrook et al., 1989, Molecularcloning: a laboratory manual. Cold Spring Harbor Laboratory Press, NewYork).

In Vitro Translation

The replicase may be provided by expression from a suitable host asdescribed elsewhere, or it may be produced by in vitrotranscription/translation in a suitable system as known in the art.

The in vitro translation system will usually comprise a cell extract,typically from bacteria (Zubay, 1973, Annu. Rev. Genet. 7, 267-87;Zubay, 1980, Methods Enzymol. 65, 856-77; Lesley et al., 1991, J. Biol.Chem. 266 (4), 2632-8; Lesley, 1995, Methods Mol. Biol. 37, 265-78),rabbit reticulocytes (Pelham and Jackson, 1976, Eur. J. Biochem. 67,247-56), or wheat germ (Anderson et al., 1983, Methods Enzymol. 101,635-44). Many suitable systems are commercially available (for examplefrom Promega) including some which will allow coupledtranscription/translation (all the bacterial systems and thereticulocyte and wheat germ TNT™ extract systems from Promega). Themixture of amino acids used may include synthetic amino acids ifdesired, to increase the possible number or variety of proteins producedin the library. This can be accomplished by charging tRNAs withartificial amino acids and using these tRNAs for the in vitrotranslation of the proteins to be selected (Ellman et al., 1991, MethodsEnzymol. 202, 301-36; Beimer, 1994, Trends Biotechnol. 12, 158-63;Mendel et al., 1995, Annu. Rev. Biophys. Biomol. Struc. 24, 435-62).Particularly desirable may be the use of in vitro translation systemsreconstituted from purified components like the PURE system (Shimizu etal., 2001, Nat. Biotech. 19, 751).

After each round of selection the enrichment of the pool of nucleicacids for those encoding the molecules of interest can be assayed bynon-compartmentalised in vitro transcription/replication or coupledtranscription-translation reactions. The selected pool is cloned into asuitable plasmid vector and RNA or recombinant protein is produced fromthe individual clones for further purification and assay.

The invention moreover relates to a method for producing a gene product,once a nucleic acid encoding the gene product has been selected by themethod of the invention. Clearly, the nucleic acid itself may bedirectly expressed by conventional means to produce the gene product.However, alternative techniques may be employed, as will be apparent tothose skilled in the art. For example, the genetic informationincorporated in the gene product may be incorporated into a suitableexpression vector, and expressed therefrom.

Compartments

As used here, the term “compartment” is synonymous with “microcapsule”and the terms are used interchangeably. The function of the compartmentis to enable co-localisation of the nucleic acid and the correspondingpolypeptide encoded by the nucleic acid. This is preferably achieved bythe ability of the compartment to substantially restrict diffusion oftemplate and product strands to other compartments. Any replicaseactivity of the polypeptide is therefore restricted to being exercisedon a nucleic acid within the confines of a compartment, and not othernucleic acids in other compartments. Another function of compartments isto restrict diffusion of molecules generated in a chemical or enzymaticreaction that feed or unblock a replication reaction.

The compartments of the present invention therefore require appropriatephysical properties to allow the working of the invention.

First, to ensure that the nucleic acids and polypeptides do not diffusebetween compartments, the contents of each compartment must be isolatedfrom the contents of the surrounding compartments, so that there is noor little exchange of the nucleic acids and polypeptides between thecompartments over a significant timescale.

Second, the method of the present invention requires that there are onlya limited number of nucleic acids per compartment, or that all memberswithin a single compartment are clonal (i.e. identical). This ensuresthat the polypeptide encoded by and corresponding to an individualnucleic acid will be isolated from other different nucleic acids. Thus,coupling between nucleic acid and its corresponding polypeptide will behighly specific. The enrichment factor is greatest with on average oneor fewer nucleic acid clonal species per compartment, the linkagebetween nucleic acid and the activity of the encoded polypeptide beingas tight as is possible, since the polypeptide encoded by an individualnucleic acid will be isolated from the products of all other nucleicacids. However, even if the theoretically optimal situation of, onaverage, a single nucleic acid or less per compartment is not used, aratio of 5, 10, 50, 100 or 1000 or more nucleic acids per compartmentmay prove beneficial in selecting from a large library. Subsequentrounds of selection, including renewed compartmentalisation withdiffering nucleic acid distribution, will permit more stringentselection of the nucleic acids. Preferably, on average there is a singlenucleic acid clonal species, or fewer, per compartment.

Moreover, each compartment contains a nucleic acid; this means thatwhilst some compartments may remain empty, the conditions are adjustedsuch that, statistically, each compartment will contain at least one,and preferably only one, nucleic acid.

Third, the formation and the composition of the compartments must notabolish the function of the machinery for the expression of the nucleicacids and the activity of the polypeptides.

Consequently, any compartmentalisation system used must fulfil thesethree requirements. The appropriate system(s) may vary depending on theprecise nature of the requirements in each application of the invention,as will be apparent to the skilled person.

Various technologies are available for compartmentalisation, forexample, gas aphrons (Juaregi and Varley, 1998, Biotechnol. Bioeng. 59,471) and prefabricated nanowells (Huang and Schreiber, 1997, Proc. Natl.Acad. Sci. USA 94, 25). For different applications, differentcompartment sizes and surface chemistries, as discussed in furtherdetail below, may be desirable. For example, it may be sufficient toutilise diffusion limiting porous materials like gels or alginate(Draget et al., 1997, Int. J. Macromol. 21, 47) or zeolithe-typematerials. Furthermore, where in-situ PCR or in-cell PCR is carried out,cells may be treated with a cross-linking fixative to form porouscompartments allowing diffusion of dNTPs, enzymes and primers.

A wide variety of compartmentalisation or microencapsulation proceduresare available (Benita, S., Ed. (1996). Microencapsulation: methods andindustrial applications. Drugs and pharmaceutical sciences. Edited bySwarbrick, J. New York: Marcel Dekker) and may be used to create thecompartments used in accordance with the present invention. Indeed, morethan 200 microencapsulation or compartmentalisation methods have beenidentified in the literature (Finch, C. A., 1993, Encapsulation andcontrolled release. Spec. Publ-R. Soc. Chem. 138, 35).

These include membrane enveloped aqueous vesicles such as lipid vesicles(liposomes) (New, R. R. C., Ed. (1990). Liposomes: a practical approach.The practical approach series. Edited by Rickwood, D. & Hames, B. D.Oxford: Oxford University Press) and non-ionic surfactant vesicles (vanHal, D. A., Bouwstra, J. A. & Junginger, H. E. (1996). Nonionicsurfactant vesicles containing estradiol for topical application. InMicroencapsulation: methods and industrial applications (Benita, S.,ed.), pp. 329-347. Marcel Dekker, New York). These are closed-membranouscapsules of single or multiple bilayers of non-covalently assembledmolecules, with each bilayer separated from its neighbour by an aqueouscompartment. In the case of liposomes the membrane is composed of lipidmolecules; these are usually phospholipids but sterols such ascholesterol may also be incorporated into the membranes (New, R. R. C.,Ed. (1990). Liposomes: a practical approach. The practical approachseries. Edited by Rickwood, D. & Hames, B. D. Oxford: Oxford UniversityPress). A variety of enzyme-catalysed biochemical reactions, includingRNA and DNA polymerisation, can be performed within liposomes(Chakrabarti, 1994, J. Mol. Evol. 39, 555-9; Oberholzer, 1995, Biochem.Biophys. Res. Commun. 207, 250-7; Oberholzer, 1995, Chem. Biol. 2,677-82; Walde, 1998, Biotechnol. Bioeng. 57, 216-219; Wick and Luisi,1996, Chem. Biol. 3, 277-85).

With a membrane-enveloped vesicle system much of the aqueous phase isoutside the vesicles and is therefore non-compartmentalised. Thiscontinuous, aqueous phase should be removed or the biological systems init inhibited or destroyed (for example, by digestion of nucleic acidswith DNase or RNase) in order that the reactions are limited to thecompartmentalised microcapsules (Luisi et al., 1987, Methods Enzymol.136, 188-216).

Enzyme-catalysed biochemical reactions have also been demonstrated inmicrocapsule compartments generated by a variety of other methods. Manyenzymes are active in reverse micellar solutions (Bru and Walde, 1991,Eur. J. Biochem. 199, 95-103; Bru and Walde, 1993, Biochem. Mol. Biol.Int. 31, 685-92; Creagh et al., 1993, Enzyme Microb. Technol. 15,383-92; Haber et al., 1993 UNABLE TO FIND; Kumar et al., 1989, Biophys.J. 55, 789-792; Luisi, P. L. and B., S.-H., 1987, Activity andconformation of enzymes in reverse micellar solutions. Methods Enzymol.136 (188), 188-216; Mao and Walde, 1991, Biochem. Biophys. Res. Commun.178, 1105-1112; Mao, Q. and Walde, P., 1991, Substrate effects on theenzymatic activity of alpha-chymotrypsin in reverse micelles. Biochem.Biophys. Res. Commun. 178 (3), 1105-12; Mao, 1992, Eur. J. Biochem. 208,165-70; Perez, G. M., Sanchez, F. A. and Garcia, C. F., 1992,Application of active-phase plot to the kinetic analysis of lipoxygenasein reverse micelles. Biochem. J.; Walde, P., Goto, A., Monnard, P.-A.,Wessicken, M. and Luisi, P. L., 1994, Oparin's reactions revisited:enzymatic synthesis of poly(adenylic acid) in micelles andself-reproducing vesicles. J. Am. Chem. Soc. 116, 7541-7547; Walde, P.,Han, D. and Luisi, P. L., 1993, Spectroscopic and kinetic studies oflipases solubilized in reverse micelles. Biochemistry 32, 4029-34;Walde, 1988, Eur. J. Biochem. 173, 401-9) such as theAOT-isooctane-water system (Menger, F. M. and Yamada, K., 1979, J. Am.Chem. Soc. 101, 6731-6734).

Compartments can also be generated by interfacial polymerisation andinterfacial complexation (Whateley, T. L., 1996, Microcapsules:preparation by interfacial polymerisation and interfacial complexationand their applications. In Microencapsulation: methods and industrialapplications (Benita, S., ed.), pp. 349-375. Marcel Dekker, New York).Microcapsule compartments of this sort can have rigid, nonpermeablemembranes, or semipermeable membranes. Semipermeable microcapsulesbordered by cellulose nitrate membranes, polyamide membranes andlipid-polyamide membranes can all support biochemical reactions,including multienzyme systems (Chang, 1987, Methods Enzymol. 136, 67-82;Chang, 1992, Artif. Organs 16, 71-4; Lim, 1984, Appl. Biochem.Biotechnol. 10, 81-5). Alginate/polylysine compartments (Lim and Sun,1980, Science 210, 908-10), which can be formed under very mildconditions, have also proven to be very biocompatible, providing, forexample, an effective method of encapsulating living cells and tissues(Chang, 1992, Artif Organs 16, 71-4; Sun, 1992, ASAIO J. 38, 125-7).

Non-membranous compartmentalisation systems based on phase partitioningof an aqueous environment in a colloidal system, such as an emulsion,may also be used.

Preferably, the compartments of the present invention are formed fromemulsions; heterogeneous systems of two immiscible liquid phases withone of the phases dispersed in the other as droplets of microscopic orcolloidal size (Becher, P. (1957) Emulsions: theory and practice.Reinhold, New York; Sherman, P. (1968) Emulsion science. Academic Press,London; Lissant, K. J., ed. Emulsions and emulsion technology.Surfactant Science New York: Marcel Dekker, 1974; Lissant, K. J., ed.Emulsions and emulsion technology. Surfactant Science New York: MarcelDekker, 1984).

Emulsions may be produced from any suitable combination of immiscibleliquids. Preferably the emulsion of the present invention has water(containing the biochemical components) as the phase present in the formof finely divided droplets (the disperse, internal or discontinuousphase) and a hydrophobic, immiscible liquid (an “oil”) as the matrix inwhich these droplets are suspended (the nondisperse, continuous orexternal phase). Such emulsions are termed “water-in-oil” (W/O). Thishas the advantage that the entire aqueous phase containing thebiochemical components is compartmentalised in discrete droplets (theinternal phase). The external phase, being a hydrophobic oil, generallycontains none of the biochemical components and hence is inert.

The emulsion may be stabilised by addition of one or more surface-activeagents (surfactants). These surfactants are termed emulsifying agentsand act at the water/oil interface to prevent (or at least delay)separation of the phases. Many oils and many emulsifiers can be used forthe generation of water-in-oil emulsions; a recent compilation listedover 16,000 surfactants, many of which are used as emulsifying agents(Ash, M. and Ash, I. (1993) Handbook of industrial surfactants. Gower,Aldershot). Suitable oils include light white mineral oil and non-ionicsurfactants (Schick, 1966 not found) such as sorbitan monooleate (Span™80; ICI) and polyoxyethylenesorbitan monooleate (Tween™ 80; ICI) ort-Octylphenoxypolyethoxy-ethanol (Triton X-100).

The use of anionic surfactants may also be beneficial. suitablesurfactants include sodium cholate and sodium taurocholate. Particularlypreferred is sodium deoxycholate, preferably at a concentration of 0.5%w/v, or below. Inclusion of such surfactants can in some cases increasethe expression of the nucleic acids and/or the activity of thepolypeptides. Addition of some anionic surfactants to a non-emulsifiedreaction mixture completely abolishes translation. Duringemulsification, however, the surfactant is transferred from the aqueousphase into the interface and activity is restored. Addition of ananionic surfactant to the mixtures to be emulsified ensures thatreactions proceed only after compartmentalisation.

Creation of an emulsion generally requires the application of mechanicalenergy to force the phases together. There are a variety of ways ofdoing this which utilise a variety of mechanical devices, includingstirrers (such as magnetic stir-bars, propeller and turbine stirrers,paddle devices and whisks), homogenisers (including rotor-statorhomogenisers, high-pressure valve homogenisers and jet homogenisers),colloid mills, ultrasound and “membrane emulsification” devices (Becher,P. (1957) Emulsions: theory and practice. Reinhold, New York; Dickinson,E. (1994) In Wedlock, D. J. (ed.), Emulsions and droplet size control.Butterworth-Heine-mann, Oxford, Vol. pp. 191-257).

Aqueous compartments formed in water-in-oil emulsions are generallystable with little if any exchange of polypeptides or nucleic acidsbetween compartments. Additionally, it is known that several biochemicalreactions proceed in emulsion compartments. Moreover, complicatedbiochemical processes, notably gene transcription and translation arealso active in emulsion microcapsules. The technology exists to createemulsions with volumes all the way up to industrial scales of thousandsof litres (Becher, P. (1957) Emulsions: theory and practice. Reinhold,New York; Sherman, P. (1968) Emulsion science. Academic Press, London;Lissant, K. J., ed. Emulsions and emulsion technology. SurfactantScience New York: Marcel Dekker, 1974; Lissant, K. J., ed. Emulsions andemulsion technology. Surfactant Science New York; Marcel Dekker, 1984).

The preferred compartment size will vary depending upon the preciserequirements of any individual selection process that is to be performedaccording to the present invention. In all cases, there will be anoptimal balance between gene library size, the required enrichment andthe required concentration of components in the individual compartmentsto achieve efficient expression and reactivity of the polypeptides.

The processes of expression may occur either in situ within eachindividual microcapsule or exogenously within cells (e.g. bacteria) orother suitable forms of subcompartmentalization. Both in vitrotranscription and coupled transcription-translation become lessefficient at sub-nanomolar DNA concentrations. Because of therequirement for only a limited number of DNA molecules to be present ineach compartment, this therefore sets a practical upper limit on thepossible compartment size where in vitro transcription is used.Preferably, for expression in situ using in vitro transcription and/ortranslation the mean volume of the compartments is less that 5.2×10⁻¹⁶m³, (corresponding to a spherical compartment of diameter less than 1μm.

An alternative is the separation of expression and compartmentalisation,e.g. using a cellular host. For inclusion of cells (in particulareucaryotic cells) mean compartment diameters of larger than 10 μM may bepreferred.

As shown in the Examples, to colocalize the polymerase gene and encodedprotein within the same emulsion compartment, we used bacteria (E. coli)overexpressing Taq polymerase as “delivery vehicles”. E. coli cells(diameter 1-5 μM) fit readily into our emulsion compartments whileleaving room for sufficient amounts of PCR reagents like nucleotidetriphosphates and primers (as shown in FIG. 2). The denaturation step ofthe first PCR cycle ruptures the bacterial cell and releases theexpressed polymerase and its encoding gene into the compartment allowingself-replication to proceed while simultaneously destroying backgroundbacterial enzymatic activities. Furthermore, by analogy to hot-startstrategies, this cellular “subcompartmentalization” prevents release ofpolymerase activity at ambient temperatures and the resultingnon-specific amplification products.

The effective DNA or RNA concentration in the compartments may beartificially increased by various methods that will be well-known tothose versed in the art. These include, for example, the addition ofvolume excluding chemicals such as polyethylene glycols (PEG) and avariety of gene amplification techniques, including transcription usingRNA polymerases including those from bacteria such as E. coli (Roberts,1969, Nature 224, 1168-74; Blattner and Dahlberg, 1972, Nat. New. Biol.237, 227-32; Roberts et al., 1975, J. Biol. Chem. 250, 5530-41;Rosenberg et al., 1975, J. Biol. Chem. 250, 4755-4764), eukaryotes e.g.(Weil et al., 1979, J. Biol. Chem. 254, 6163-6173; Manley et al., 1983,Methods Enzymol. 101, 568-82) and bacteriophage such as T7, T3 and SP6(Melton et al., 1984, Nucleic Acids Res. 12, 7035-56); the polymerasechain reaction (PCR) (Saiki et al., 1988, Science 239, 487-91); Qβreplicase amplification (Miele et al., 1983, J. Mol. Biol. 171, 281-95;Cahill et al., 1991, Clin. Chem. 37, 1482-5; Chetverin and Spirin, 1995,Frog Nucleic Acid Res. Mol. Biol. 51, 225-70; Katanaev et al., 1995,FEBS Lett. 359, 89-92); the ligase chain reaction (LCR) (Landegren etal., 1988, Science 241, 1077-80; Barany, 1991, PCR Methods Appl. 1,5-16); and self-sustained sequence replication system (Fahy et al.,1991, PCR Methods Appl. 1, 25-33) and strand displacement amplification(Walker et al., 1992, Nucleic Acids Res. 20, 1691-6). Gene amplificationtechniques requiring thermal cycling such as PCR and LCR may also beused if the emulsions and the in vitro transcription or coupledtranscription-translation systems are thermostable (for example, thecoupled transcription-translation systems could be made from athermostable organism such as Thermus aquaticus).

Increasing the effective local nucleic acid concentration enables largercompartments to be used effectively.

The compartment size must be sufficiently large to accommodate all ofthe required components of the biochemical reactions that are needed tooccur within the compartment. For example, in vitro, both transcriptionreactions and coupled transcription-translation reactions require atotal nucleoside triphosphate concentration of about 2 mM.

For example, in order to transcribe a gene to a single short RNAmolecule of 500 bases in length, this would require a minimum of 500molecules of nucleoside triphosphate per compartment (8.33×10⁻²² moles).In order to constitute a 2 mM solution, this number of molecules must becontained within a compartment of volume 4.17×10⁻¹⁹ litres (4.17×10²²m³) which if spherical would have a diameter of 93 nm. Hence, thepreferred lower limit for microcapsules is a diameter of approximately0.1 μm (100 nm).

When using expression hosts as delivery vehicles, there are much lessstrict requirements on the compartment size. Basically, the compartmenthas to be of sufficient size to contain the expression host as well assufficient amounts of reagents to carry out the required reactions.Thus, in such cases larger compartment sizes >10 μM are preferred. By anappropriate choice of vector used for expression in the host, thetemplate concentration within compartments can be controlled via thevector origin and resulting copy number (e.g. E. coli: colE (pUC)>100,p15: 30-50, pSC101:1-4). Likewise the concentration of the gene productcan be controlled by the amount by choice of expression promoter andexpression protocol (e.g. full induction of expression versus promoterleakage). Preferably, gene product concentration is as high as possible.

Furthermore, the use of feeder compartments allows feeding of substratesfrom the outside (see Ghadessy et al., 2001, PNAS 98, 4552; 01). Feedingemulsion reactions from the outside may allow compartment dimensions<0.1 μM for ribozyme selections, as reagents do not need to be containedin their entirety within the compartment.

The size of emulsion microcapsules or compartments may be varied simplyby tailoring the emulsion conditions used to form the emulsion accordingto requirements of the selection system. The larger the compartmentsize, the larger is the volume that will be required to encapsulate agiven nucleic acid library, since the ultimately limiting factor will bethe size of the compartment and thus the number of microcapsulecompartments possible per unit volume.

The size of the compartments is selected not only having regard to therequirements of the replication system, but also those of the selectionsystem employed for the nucleic acid.

Thus, the components of the selection system, such as a chemicalmodification system, may require reaction volumes and/or reagentconcentrations, which are not optimal for replication. As set forthherein, such requirements may be accommodated by a secondaryre-encapsulation step; moreover, they may be accommodated by selectingthe compartment size in order to maximise replication and selection as awhole. Empirical determination of optimal compartment volume and reagentconcentration, for example, as set forth herein, is preferred.

In a highly preferred embodiment of the present invention, the emulsionis a water-in-oil emulsion. The water-in-oil emulsion is made by addingan aqueous phase dropwise to an oil phase in the presence of asurfactant comprising 4.5% (v/v) Span 80, about 0.4% (v/v) Tween 80 andabout 0.05-0.1% (v/v) Triton X100 in mineral oil preferably at a ratioof oil:water phase of 2:1 or 3:1. It appears that the ratio of the threesurfactants is important for the advantageous properties of theemulsion, and accordingly, our invention also encompasses a water-in-oilemulsion having increased amounts of surfactant but with substantiallythe same ratio of Span 80, Tween 80 and Triton X100. In a preferredembodiment, the surfactant comprises 4.5% (v/v) Span 80, 0.4% (v/v)Tween 80 and 0.05% (v/v) Triton X100.

The water-in-oil emulsion is preferably formed under constant stirringin 2 ml round bottom biofreeze vials with continued stirring at 1000 rpmfor a further 4 or 5 minutes after complete addition of the aqueousphase. The rate of addition may be up to 12 drops/mm (ca. 10 μl each).The aqueous phase may include just water, or it may comprise a bufferedsolution having additional components such as nucleic acids, nucleotidetriphosphates, etc. In a preferred embodiment, the aqueous phasecomprises a PCR reaction mix as disclosed elsewhere in this document, aswell as nucleic acid, and polymerase. The water-in-oil emulsion may beformed from 200 μl of aqueous phase (for example PCR reaction mix) and400 μl oil phase as described above.

The water-in-oil emulsion according to the invention has advantageousproperties of increased thermal stability. Thus, no changes incompartment size or evidence of coalescence is observed after 20 cyclesof PCR as judged by laser diffraction and light microscopy. This isshown in FIG. 2. In addition, polymerase chain reaction proceededefficiently within the compartments of this water-in-oil composition, toapproach the rates observed in solution PCR. Average aqueous compartmentdimensions in the water-in-oil emulsion according to our invention areon average 15 μm in size. Once formed, the compartments of the emulsionaccording to our invention do not permit the exchange of macromoleculeslike DNA and proteins to any significant degree (as shown in FIG. 3A).This is presumably because the large molecular weight and charged natureof the macromolecules precludes diffusion across the hydrophobicsurfactant shell, even at elevated temperatures.

Nucleic Acids

A nucleic acid in accordance with the present invention is as describedabove. Preferably, the nucleic acid is a molecule or construct selectedfrom the group consisting of a DNA molecule, an RNA molecule, apartially or wholly artificial nucleic acid molecule consisting ofexclusively synthetic or a mixture of naturally-occurring and syntheticbases, any one of the foregoing linked to a polypeptide, and any one ofthe foregoing linked to any other molecular group or construct.Advantageously, the other molecular group or construct may be selectedfrom the group consisting of nucleic acids, polymeric substances,particularly beads, for example polystyrene beads, magnetic substancessuch as magnetic beads, labels, such as fluorophores or isotopic labels,chemical reagents, binding agents such as macrocycles and the like.

The nucleic acid may comprise suitable regulatory sequences, such asthose required for efficient expression of the gene product, for examplepromoters, enhancers, translational initiation sequences,polyadenylation sequences, splice sites and the like.

The terms “isolating”, “sorting” and “selecting”, as well as variationsthereof, are used herein. Isolation, according to the present invention,refers to the process of separating an entity from a heterogeneouspopulation, for example a mixture, such that it is free of at least onesubstance with which it is associated before the isolation process. In apreferred embodiment, isolation refers to purification of an entityessentially to homogeneity. Sorting of an entity refers to the processof preferentially isolating desired entities over undesired entities. Inas far as this relates to isolation of the desired entities, the terms“isolating” and “sorting” are equivalent. The method of the presentinvention permits the sorting of desired nucleic acids from pools(libraries or repertoires) of nucleic acids which contain the desirednucleic acid. Selecting is used to refer to the process (including thesorting process) of isolating an entity according to a particularproperty thereof.

“Oligonucleotide” refers to a molecule comprised of two or moredeoxyribonucleotides or ribonucleotides, preferably more than three. Theexact size of the oligonucleotide will depend on the ultimate functionor use of the oligonucleotide. The oligonucleotide may be derivedsynthetically or by cloning.

The nucleic acids selected according to our invention may be furthermanipulated. For example, nucleic acid encoding selected replicase orinteracting polypeptides are incorporated into a vector, and introducedinto suitable host cells to produce transformed cell lines that expressthe gene product. The resulting cell lines can then be propagated forreproducible qualitative and/or quantitative analysis of the effect(s)of potential drugs affecting gene product function. Thus gene productexpressing cells may be employed for the identification of compounds,particularly small molecular weight compounds, which modulate thefunction of gene product. Thus host cells expressing gene product areuseful for drug screening and it is a further object of the presentinvention to provide a method for identifying compounds which modulatethe activity of the gene product, said method comprising exposing cellscontaining heterologous DNA encoding gene product, wherein said cellsproduce functional gene product, to at least one compound or mixture ofcompounds or signal whose ability to modulate the activity of said geneproduct is sought to be determined, and thereafter monitoring said cellsfor changes caused by said modulation. Such an assay enables theidentification of modulators, such as agonists, antagonists andallosteric modulators, of the gene product. As used herein, a compoundor signal that modulates the activity of gene product refers to acompound that alters the activity of gene product in such a way that theactivity of gene product is different in the presence of the compound orsignal (as compared to the absence of said compound or signal).

Cell-based screening assays can be designed by constructing cell linesin which the expression of a reporter protein, i.e. an easily assayableprotein, such as β galactosidase, chloramphenicol acetyltransferase(CAT) or luciferase, is dependent on gene product. Such an assay enablesthe detection of compounds that directly modulate gene product function,such as compounds that antagonise gene product, or compounds thatinhibit or potentiate other cellular functions required for the activityof gene product.

The present invention also provides a method to exogenously affect geneproduct dependent processes occurring in cells. Recombinant gene productproducing host cells, e.g. mammalian cells, can be contacted with a testcompound, and the modulating effect(s) thereof can then be evaluated bycomparing the gene product-mediated response in the presence and absenceof test compound, or relating the gene product-mediated response of testcells, or control cells (i.e., cells that do not express gene product),to the presence of the compound.

Nucleic Acid Libraries

The method of the present invention is useful for sorting libraries ofnucleic acids. Herein, the terms “library”, “repertoire” and “pool” areused according to their ordinary signification in the art, such that alibrary of nucleic acids encodes a repertoire of gene products. Ingeneral, libraries are constructed from pools of nucleic acids and haveproperties, which facilitate sorting. Initial selection of a nucleicacid from a library of nucleic acids using the present invention will inmost cases require the screening of a large number of variant nucleicacids. Libraries of nucleic acids can be created in a variety ofdifferent ways, including the following.

Pools of naturally occurring nucleic acids can be cloned from genomicDNA or cDNA (Sambrook et al., 1989, Molecular cloning: a laboratorymanual. Cold Spring Harbor Laboratory Press, New York); for example,phage antibody libraries, made by PCR amplification repertoires ofantibody genes from immunised or uninimunised donors have proved veryeffective sources of functional antibody fragments (Winter et al., 1994,Annu. Rev. Immunol. 12, 433-55; Hoogenboom, H. R., 1997, TrendsBiotechnol. 15, 62-70). Designing and optimizing library selectionstrategies for generating high-affinity antibodies. Trends Biotechnol.15, 62-70; Hoogenboom, H. R., 1997, Trends Biotechnol. 15, 62-70).Libraries of genes can also be made by encoding all (see for exampleSmith, G. P., 1985, Science 228, 1315-7; Parmley, S. F. and Smith, G.P., 1988, Gene 73, 305-18) or part of genes (see for example Lowman etal., 1991, Biochemistry 30, 10832-8) or pools of genes (see for exampleNissim, A., Hoogenboom et al., 1994, Embo J. 13, 692-8) by a randomisedor doped synthetic oligonucleotide. Libraries can also be made byintroducing mutations into a nucleic acids or pools of nucleic acids“randomly” by a variety of techniques in vivo, including: using “mutatorstrains” of bacteria such as E. coli mutD5 (Liao et al., 1986, Proc.Natl. Acad. Sci. USA 83, 576-80; Yamagishi et al., 1990, Protein Eng. 3,713-9; Low et al., 1996, J. Mol. Biol. 260, 359-68); using the antibodyhypermutation system of B-lymphocytes (Yelamos et al., 1995, Nature 376,225-9). Random mutations can also be introduced both in vivo and invitro by chemical mutagens, and ionising or UV irradiation (seeFriedberg et al., 1995, DNA repair and mutagenesis. ASM Press,Washington D.C.), or incorporation of mutagenic base analogues (Freese,1959, J. Mol. Biol. 1, 87; Zaccolo et al., 1996, J. Mol. Biol. 255,589-603). “Random” mutations can also be introduced into genes in vitroduring polymerisation for example by using error-prone polymerases(Leung et al., 1989, Technique 1, 11-15).

Further diversification can be introduced by using homologousrecombination either in vivo (Kowalczykowski et al., 1994, Microbiol.Rev. 58, 401-65 or in vitro (Stemmer, 1994, Nature 370, 389-9; Stemmer,1994, Proc. Natl. Acad. Sci. USA 91, 10747-51).

Agent

As used herein, the term “agent” includes but is not limited to an atomor molecule, wherein a molecule may be inorganic or organic, abiological effector molecule and/or a nucleic acid encoding an agentsuch as a biological effector molecule, a protein, a polypeptide, apeptide, a nucleic acid, a peptide nucleic acid (PNA), a virus, avirus-like particle, a nucleotide, a ribonucleotide, a syntheticanalogue of a nucleotide, a synthetic analogue of a ribonucleotide, amodified nucleotide, a modified ribonucleotide, an amino acid, an aminoacid analogue, a modified amino acid, a modified amino acid analogue, asteroid, a proteoglycan, a lipid, a fatty acid and a carbohydrate. Anagent may be in solution or in suspension (e.g., in crystalline,colloidal or other particulate form). The agent may be in the form of amonomer, dimer, oligomer, etc., or otherwise in a complex.

Polypeptide

As used herein, the terms “peptide”, “polypeptide” and “protein” referto a polymer in which the monomers are amino acids and are joinedtogether through peptide or disulfide bonds. “Polypeptide” refers toeither a full-length naturally-occurring amino acid chain or a “fragmentthereof” or “peptide”, such as a selected region of the polypeptide thatbinds to another protein, peptide or polypeptide in a manner modulatableby a ligand, or to an amino acid polymer, or a fragment or peptidethereof, which is partially or wholly non-natural. “Fragment thereof”thus refers to an amino acid sequence that is a portion of a full-lengthpolypeptide, between about 8 and about 500 amino acids in length,preferably about 8 to about 300, more preferably about 8 to about 200amino acids, and even more preferably about 10 to about 50 or 100 aminoacids in length. “Peptide” refers to a short amino acid sequence that is10-40 amino acids long, preferably 10-35 amino acids. Additionally,unnatural amino acids, for example, β-alanine, phenyl glycine andhomoarginine may be included. Commonly encountered amino acids, whichare not gene-encoded, may also be used in the present invention. All ofthe amino acids used in the present invention may be either the D- orL-optical isomer. The L-isomers are preferred. In addition, otherpeptidomimetics are also useful, e.g. in linker sequences ofpolypeptides of the present invention (see Spatola, 1983, in Chemistryand Biochemistry of Amino Acids, Peptides and Proteins, Weinstein, ed.,Marcel Dekker, New York, p. 267). A “polypeptide binding molecule” is amolecule, preferably a polypeptide, protein or peptide, which has theability to bind to another polypeptide, protein or peptide. Preferably,this binding ability is modulatable by a ligand.

The term “synthetic”, as used herein, means that the process orsubstance described does not ordinarily occur in nature. Preferably, asynthetic substance is defined as a substance which is produced by invitro synthesis or manipulation.

The term “molecule” is used herein to refer to any atom, ion, molecule,macromolecule (for example polypeptide), or combination of suchentities. The term “ligand” may be used interchangeably with the term“molecule”. Molecules according to the invention may be free insolution, or may be partially or fully immobilised. They may be presentas discrete entities, or may be complexed with other molecules.Preferably, molecules according to the invention include polypeptidesdisplayed on the surface of bacteriophage particles. More preferably,molecules according to the invention include libraries of polypeptidespresented as integral parts of the envelope proteins on the outersurface of bacteriophage particles. Methods for the production oflibraries encoding randomised polypeptides are known in the art and maybe applied in the present invention. Randomisation may be total, orpartial; in the case of partial randomisation, the selected codonspreferably encode options for amino acids, and not for stop codons.

EXAMPLES Example 1 Construction of Taq Polymerase Expression Plasmids

The Taq polymerase open reading frame is amplified by PCR from Thermusaquaticus genomic DNA using primers 1 & 2, cut with XbaI & SalI andligated into pASK75 (Skerra A., 1994, Gene 151, 131) cut with XbaI &SalI. pASK75 is an expression vector which directs the synthesis offoreign proteins in E. coli under transcriptional control of the tetApromoter/operator.

Clones are screened for inserts using primers 3, 4 and assayed forexpression of active Taq polymerase (Taq pol) (see below). The inactiveTaq pol mutant D785H/E786V is constructed using Quickchange mutagenesis(Stratagene). The mutated residues are critical for activity (Doublie S.et al., 1998, Nature 391, 251; Kiefer J. R. et al., 1998, Nature 391,304). Resulting clones are screened for mutation using PCR screeningwith primers 3, 5 and diagnostic digestion of the products with PmlI.Mutant clones are assayed for expression of active Taq pol (see below).

Example 2 Protein Expression and Activity Assay

Transformed TG1 cells are grown in 2×TY 0.1 mg/ml ampicillin. Forexpression, overnight cultures are diluted 1/100 into fresh 2×TY mediumand grown to OD600=0.5 at 37° C. Protein expression is induced byaddition of anhydro tetracycline to a final concentration of 0.2 μg/ml.After 4 hours further incubation at 37° C., cells are spun down, washedonce, and re-suspended in an equal volume of 1× SuperTaq polymerasebuffer (50 mM KCl, 10 mM Tris-HCl (pH9.0), 0.1% Triton X-100, 1.5 mMMgCl₂) (HT Biotechnology Ltd, Cambridge UK).

Washed cells are added directly to a PCR reaction mix (2 μl per 30 μlreaction volume) comprising template plasmid (20 ng), primers 4 and 5 (1μM each), dNTPs (0.25 mM), 1× SuperTaq polymerase buffer, and overlaidwith mineral oil. Reactions are incubated for 10 min at 94° C. torelease Taq pol from the cells and then thermocycled with 30 cycles ofthe profile 94° C. (1 min), 55° C. (1 min), 72° C. (2 min).

Example 3 Emulsification of Amplification Reactions

Emulsification of reactions is carried out as follows. 200 μl of PCRreaction mix (Taq expression plasmid (200 ng), primers 3 and 4 (1 μMeach), dNTPs (0.25 mM), Taq polymerase (10 units) is added dropwise (12drops/min) to the oil phase (mineral oil (Sigma)) in the presence of4.5% (v/v) Span 80 (Fluka), 0.4% (v/v) Tween 80 (Sigma) and 0.05% (v/v)Triton X100 (Sigma) under constant stirring (1000 rpm) in 2 ml roundbottom biofreeze vials (Costar, Cambridge Mass.). After completeaddition of the aqueous phase, stirring is continued for a further 4minutes. Emulsified mixtures are then transferred to 0.5 ml thin-walledPCR tubes (100 μl/tube) and PCR carried out using 25 cycles of theprofile 94° C. (1 min), 60° C. (1 min), 72° C. (3 min) after an initial5 min incubation at 94° C. Reaction mixtures are recovered by theaddition of a double volume of ether, vortexing and centrifugation for 2minutes prior to removal of the ether phase. Amplified product isvisualised on by gel electrophoresis on agarose gels using standardmethods (see for example J. Sambrook, E. F. Fritsch, and T. Maniatis,1989, Molecular Cloning: A Laboratory Manual, Second Edition, Books 1-3,Cold Spring Harbor Laboratory Press).

For emulsification of whole cells expressing Taq polymerase, theprotocol is modified in the following way: Taq expression plasmid andTaq polymerase in the reaction cocktail are omitted and instead 5×10⁸induced E. coli TG1 cells (harbouring the expressed Taq polymerase aswell as the expression plasmid) are added together with the additivetetramethyl ammonium chloride (50 μM), and RNAse (0.05% w/v, Roche, UK).The number of PCR cycles is also reduced to 20.

Example 4 Self-Replication of the Full-Length wt Taq Gene

In order to test genotype-phenotype linkage during self-replication, wemixed cells expressing either wild-type Taq polymerase (wt Taq) or thepoorly active (under the buffer conditions) Stoffel fragment (sf Taq)(F. C. Lawyer, et al., 1993, PCR Methods Appl. 2, 275-87) at a 1:1 ratioand subjected them to CSR either in solution or in emulsion. In solutionthe smaller sf Taq is amplified preferentially. However, in emulsionthere is almost exclusive self-replication of the full-length wt Taqgene (FIG. 3B). The number of bacterial cells is adjusted such that themajority of emulsion compartments contain only a single cell. However,because cells are distributed randomly among compartments, it isunavoidable that a minor fraction will contain two or more cells. Ascompartments do not appear to exchange template DNA (FIG. 3A), the smallamount of sf Taq amplification in emulsion is likely to originate fromthese compartments. Clearly, their abundance is low and, as such,unlikely to affect selections. Indeed, in a test selection, a singleround of CSR is sufficient to isolate wt Taq clones from a 106-foldexcess of an inactive Taq mutant.

Using error-prone PCR, we prepared two repertoires of random Taq mutants(L1 (J. P. Vartanian, M. Henry, S. Wain-Hobson, 1996, Nucleic Acid Res.24, 2627-2631 (1996)) and L2 (M. Zaccolo, E. Gherardi, 1999, J. Mol.Biol. 285, 775-83)). Only 1-5% of L1 or L2 clones are active, as judgedby PCR, but a single round of CSR selection for polymerase activityunder standard PCR conditions increased the proportion of active clonesto 81% (L1*) and 77% (L2*).

Example 5 Mutagenic PCR

Taq polymerase gene variants are constructed using two different methodsof error-prone PCR.

The first utilises the nucleoside analogues dPTP and dLTP (Zaccolo etal., 1996, J. Mol. Biol. 255, 589-603). Briefly, a 3-cycle PCR reactioncomprising 50 mM KCl, 10 mM TrisHCl (pH9.0), 0.1% Triton X-100, 2 mMMgCl2, dNTPS (500 μM), dPTP (500 μM), dLTP (500 μM), 1 pM template DNA,primers 8 and 9 (1 μM each), Taq polymerase (2.5 units) in a totalvolume of 50 μl is carried out with the thermal profile 94° C. (1 min),55° C. (1 min), 72° C. (5 min). A 2 μl aliquot is then transferred to a100 μl standard PCR reaction comprising 50 mM KCl, 10 mM Tris-HCl(pH9.0), 0.1% Triton X-100, 1.5 mM MgCl2, dNTPS (250 μM), primers 6 and7 (1 μM each), Taq polymerase (2.5 units). This reaction is cycled 30×with the profile 94° C. (30 seconds), 55° C. (30 seconds), 72° C. (4minutes). Amplified product is gel-purified, and cloned into pASK75 asabove to create library L2.

The second method utilises a combination of biased dNTPs and MnCl₂ tointroduce errors during PCR. The reaction mix comprises 50 mM KCl, 10 mMTris-HCl (pH9.0), 0.1% Triton X-100, 2.5 mM MgCl₂, 0.3 mM MnCl₂, 1 pMtemplate DNA, dTTP, dCTP, dGTP (all 1 mM), dATP (100 μM) primers 8 and 9(1 μM each) and Taq polymerase (2.5 units). This reaction is cycled 30×with the profile 94° C. (30 seconds), 55° C. (30 seconds), 72° C. (4minutes), and amplified products cloned as above to create library L1.

Example 6 Selection Protocol

For selection of active polymerases, PCR reactions within emulsions arecarried out as described above but using primers 8, 9. For selection ofvariants with increased thermostability, emulsions are preincubated at99° C. for up to 7 minutes prior to cycling as above. For selection ofvariants with increased activity in the presence of the inhibitorheparin, the latter is added to concentrations of 0.08 and 0.16 units/μland cycling carried out as above. Detailed protocols are set out infurther Examples below.

Amplification products resulting from compartments containing an activepolymerase are extracted from emulsion with ether as before and thenpurified by standard phenolchlofororm extraction. 0.5 volumes ofPEG/MgCl₂ solution (30% v/v PEG 800, 30 mM MgCl₂) is next added, andafter mixing centrifugation carried out at 13,000 RPM for 10 minutes atroom temperature. The supernatant (containing unincorporated primers anddNTPs) is discarded and the pellet re-suspended in TE. Amplifiedproducts are then further purified on spin-columns (Qiagen) to ensurecomplete removal of primers. These products are then re-amplified usingprimers 6, 7 (which are externally nested to primers 8 and 9) in astandard PCR reaction, with the exception that only 20 cycles are used.Re-amplified products are gel-purified and re-cloned into pASK75 asabove. Transformants are plated and colonies screened as below. Theremainder are scraped into 2×TY/0.1 mg/ml ampicillin, diluted down toOD₆₀₀=0.1 and grown/induced as above for repetition of the selectionprotocol.

Example 7 Colony Screening Protocol

Colonies are picked into a 96 well culture dish (Costar), grown andinduced for expression as above. For screening, 2 μl of cells are usedin a 30 μl PCR reaction to test for activity as above in a 96 well PCRplate (Costar) using primers 4 and 5. A temperature gradient block isused for the screening of selectants with increased thermostability.Reactions are preincubated for 5 minutes at temperatures ranging from94.5 to 99° C. prior to standard cycling as above with primers 4 and 5or 3 and 4. For screening of heparin-compatible polymerases, heparin isadded to 0.1 units/30 μl during the 96-well format colony PCR screen.Active polymerases are then assayed in a range of heparin concentrationsranging from 0.007 to 3.75 units/30 μl and compared to wild-type.

Example 8 Assay for Catalytic Activity of Polymerases

K_(cat) and K_(m) (dTTP) are determined using a homopolymeric substrate(Polesky et al., 1990, J. Biol. Chem. 265:14579-91). The final reactionmix (25 μl) comprises 1× SuperTaq buffer (HT Biotech),poly(dA).oligo(dT)(500 nM, Pharmacia), and variable concentrations of[α-³²P]dTTP (approx. 0.01 Ci/mmole). The reaction is initiated byaddition of 5 μl enzyme in 1× SuperTaq buffer to give a final enzymeconcentrations between 1-5 nM. Reactions are incubated for 4 minutes at72° C., quenched with EDTA as in example 14, and applied to 24 mm DE-81filters. Filters are washed and activity measured as in example 14.Kinetic parameters are determined using the standard Lineweaver-Burkeplot. Experiments using 50% reduced homopolymer substrate show no grossdifference in incorporation of dTTP by polymerase, indicating it ispresent in sufficient excess to validate the kinetic analysis protocolused.

Example 9 Standard PCR in Aqueous Compartments within an Emulsion

To establish whether conditions in the aqueous compartments present inan emulsion are permissive for catalysis, a standard reaction mix isemulsified and PCR carried out. This leads to amplification of thecorrect sized Taq polymerase gene present in the plasmid template, withyields sufficient yields to allow visualisation using standard agarosegel electrophoresis.

Example 10 Emulsification of E. coli Expressing Taq Polymerase andSubsequent PCR to Amplify Polymerase Gene

E. coli cells expressing Taq polymerase are emulsified and PCR carriedout using primers flanking the polymerase cassette in the expressionvector. Emulsification of up to 5×10⁸ cells (per 600 μl total volume)leads to discernible product formation as judged by agarose gelelectrophoresis. The cells therefore segregate into the aqueouscompartments where conditions are suitable for self-amplification of thepolymerase gene by the expressed Taq polymerase. Similar emulsions areestimated to contain about 1×10¹⁰ compartments per ml (Tawfik D. andGriffiths A. D., 1998, Nature Biotech. 16, 652). The large number ofcells that can be emulsified allows for selection from diverserepertoires of randomised protein.

Example 11 Maintenance of Genotype-Phenotype Linkage in Emulsion

To be viable for a selection method, the majority of aqueouscompartments in the emulsion should harbour a single cell, and theintegrity of compartments should be maintained during thermal cycling.This is tested by including in the emulsion cells harbouring acompetitor template distinguishable by its smaller size.

E. coli expressing Taq polymerase are co-emulsified with E. coliexpressing the Stoffel fragment at a ratio of one to one. The, Stoffelfragment is poorly active under the conditions used in emulsion, andthus amplification of its expression cassette by the same primer pairused for Taq self-amplification is the result of co-compartmentalisationwith a cell expressing active Taq polymerase or leakage of Taqpolymerase between compartments. After PCR, the vast majority ofproducts are found to correspond to the active Taq polymerase gene thusvalidating the premise of one cell per durable compartment (see FIG. 2,Ghadessy et al., 2001, PNAS 98, 4552).

Example 12 Test Selection of Active over Inactive Taq Polymerase

To demonstrate that the method can select for potentially rare variants,a 10⁶ fold excess of cells expressing inactive polymerase over thoseexpressing the active form are co-emulsified. After PCR and cloning ofamplified product, a single expression screen using a 96 well formatindicated a 10⁴ fold enrichment for the active polymerase.

Example 13 Directed Evolution of Taq Polymerase Variants with IncreasedThermal Stability

Polymerases with increased thermostability are of potential practicalimportance, reducing activity loss during thermocycling and allowinghigher denaturation temperatures for the amplification of GC richtemplates. Thus, we first used the selection method of our invention forthe directed evolution of Taq variants with increased thermostability,starting from preselected libraries (L1*, L2*) and progressivelyincreasing the temperature and duration of the initial thermaldenaturation. After 3 rounds of selection, we isolated T8 (Table 1), aTaq clone with an 11-fold longer half-life at 97.5° C. than the alreadythermostable wt Taq enzyme (Table 2), making T8 the most thermostablemember of the Pol I family on record. Clones are screened and marked bya PCR assay. Briefly, 2 μl of induced cells are added to 30 μl PCR mixand amplification of a 0.4 kb fragment is assayed under selectionconditions (e.g. increasing amounts of heparin). Thermostability andheparin resistance of purified His tagged wt and mutant Taq clones isdetermined as in Lawyer et al., 1993, PCR Methods Appl. 2, 275-287;Lawer et al., 1989, J. Biol. Chem. 264, 6427-37, using activated salmonsperm DNA and normalized enzyme concentrations. Mutations conferringthermostability to T8 (and to a majority of less thermostable mutants)cluster in the 5′-3′ exonuclease domain (Table 1). Indeed, truncationvariants of Taq polymerase (F. C. Lawyer et al., 1993, PCR Methods Appl.2, 275-87; W. M. Barnes, 1992, Gene 112, 29-35) lacking the exonucleasedomain show improved thermostability, suggesting it may be lessthermostable than the main polymerase domain. The lower thermostabilityof the exonuclease domain may have functional significance (for examplereflecting a need for greater flexibility), as the stabilizing mutationsin T8 appear to reduce exonuclease activity (approx. 5-fold) (5′-3′exonuclease activity is determined essentially as in (Y. Xu et al.,1997, J. Mol. Biol. 268, 284-302) but in 1× Taq buffer with 0.25 mMdNTP's and the 22-mer oligonucleotide of (Y. Xu et al., 1997, J. Mol.Biol. 268, 284-302) 5′ labelled with (Amersham). Steady-state kineticsare measured as in A. H. Polesky, T. A. Steitz, N. D. Grindley, C. M.Joyce, 1990, J. Biol. Chem. 265, 14579-91, using the homopolymericsubstrate poly(dA)₂₀₀ (Pharmacia) and oligo(dT)₄₀ primer at 50° C. (atleast at low temperature).

TABLE 1 Properties of Selected Clones Thermo- Heparin Round Taq variantstability* Resistance* Taq_(wt)  1  1 1 T646 (G46V, A109P, F285L)  2xn.d. T788 (F73S, R205K, K219E, M236T, A608V)  4x n.d. 2 T9 (F278L,P298S)  4x n.d. T13 (R205K, K219E, M236T, A608V)  7x n.d. 3 T8 (F73S,R205K, K219E, M236T, E434D, A608V) 11x  <0.5x 1 H32 (E9K, P93S, K340E,Q534R, T539A, V703A, n.d.  8x R778K) 2 H94 (K225E, L294P, A454S, L461R,D578G, N583S) n.d.  32x 3 H15 (K225E, E388V, K540R, D578G, N583S, M747R) 0.3x 130x *as judged by PCR (relative to Taq_(wt)), at 97.5° C. **asjudged by PCR (relative to Taq_(wt)) Clones in bold are related throughunderlined mutations. Clones are ranked in relation to wt Taq.

Two libraries of Taq polymerase variants generated using error-prone PCRare expressed in E. coli (library L1, 8×10⁷ clones, library L2, 2×10⁷clones; see example 5) and emulsified as before. The first round of PCRis carried out to enrich for active variants using the standard Taqpolymerase thermocycling profile outlined above. Enriched amplificationproducts are purified, and recloned to generate libraries comprising ofactive variants (L1*, L2*; approx. 10⁶ clones for each library). Ascreen of the L1* and L2* libraries respectively showed 81% and 77% ofrandomly picked clones to be active.

Selective pressure is applied to the L1* and L2* libraries during thenext round of PCR by pre-incubating emulsions at 99° C. for 6 or 7minutes prior to the normal PCR cycle. Under these conditions, thewild-type Taq polymerase loses all activity. Amplified products areenriched and cloned as above and a 96-well expression screen used toselect for active variants under normal PCR conditions. This yielded 7clones form the L2* library and 10 clones from the L1* library. Theseare then screened for increased thermostability using a temperaturegradient PCR block, with a 5 minute pre-incubation at temperatures of94.5 to 99° C. prior to standard cycling. As judged by gelelectrophoresis, 5 clones from each library are present with increasedthermostability compared to wild-type. These mutants are able toefficiently amplify the 320 b.p. target after pre-incubation at 99° C.for 5 minutes. The wild-type enzyme has no discernible activity afterpre-incubation at temperatures above 97° C. for 5 minutes or longer.

Example 14 Assay for Thermal Stability of Polymerase

Thermal inactivation assays of WT and purified His-tagged polymerasesare carried out in a standard 50 μl PCR mixture comprising 1× SuperTaqbuffer (HT Biotech), 0.5 ng plasmid DNA template, 200 μM each of dATP,dTTP, and dGTP, primers 3 and 4 (10 μM), and polymerase (approximately 5nM). Reaction mixtures are overlaid with oil and incubated at 97.5° C.,with 5 μl aliquots being removed and stored on ice after definedintervals. These aliquots are assayed in a 50 μl activity reactionbuffer comprising 25 mM N-tris[hydroxymethyl-3-amino-propanesulfonicacid (TAPS) (pH9.5), 1 mM β-mercaptoethanol, 2 mM MgCl2, 200 μM eachdATP, dTTP, and dGTP, 100 μM[α-³²P]dCTP (0.05 Ci/mmole), and 250 μg/mlactivated salmon sperm DNA template. Reactions are incubated for 10minutes at 72° C., stopped by addition of EDTA (25 mM final). Reactionvolumes are made up to 500 μl with solution S (2 mM EDTA, 50 ug/mlsheared salmon sperm DNA) and 500 μl 20% TCA (v/v) 12% sodiumpyrophosphate (v/v) added. After 20 minutes incubation on ice, reactionsare applied to 24 mm GF/C filters (Whatman). Unincorporated nucleotidesare removed by 3 washes with 5% TCA (v/v), 2% sodium pyrophosphate (v/v)followed by two washes with 96% ethanol (v/v). Dried filters are countedin scintillation vials containing Ecoscint A (National Diagnostics). Theassay is calibrated using a known amount of the labeled dCTP solution(omitting the washes).

Example 15 Directed Evolution of Taq Polymerase Variants with IncreasedActivity in the Presence of the Inhibitor Heparin

As indicated above, the methods of our invention can also be used toevolve resistance to an inhibitor of enzymatic activity. Heparin is awidely used anticoagulant, but also a potent inhibitor of polymeraseactivity, creating difficulties for PCR amplifications from clinicalblood samples (J. Satsangi, D. P. Jewell, K. Welsh, M. Bunce, J. I.Bell, 1994, Lancet 343, 1509-10). While heparin can be removed fromblood samples by various procedures, these can be both costly andtime-consuming. The availability of a heparin-compatible polymerasewould therefore greatly improve characterisation of therapeuticallysignificant amplicons, and obviate the need for possiblycost-prohibitive heparinase treatment of samples (Taylor A. C., 1997,Mol. Ecol. 6, 383).

The L1* and L2* libraries are combined, and selected in emulsion forpolymerases active in up to 0.16 units heparin per μl. After a singleround, 5 active clones are isolated in the 96 well PCR screenincorporating 0.1 units/30 μl reaction, with the wild-type showing noactivity. Titration shows that 4 of these clones to be active in up tofour times the amount of heparin inhibiting wild-type (0.06 units/30 μlversus 0.015 units/30 μl). The other clone is active in up to eighttimes the amount of heparin inhibiting wild-type (0.12 units/30 μlversus 0.015 units/30 μl).

Using selection in the presence of increasing amounts of heparin, weisolated H15, a Taq variant functional in PCR at up to 130-times theinhibitory concentration of heparin (Table 2). Intriguingly, heparinresistance conferring mutations also cluster, in this case in the baseof the finger and thumb polymerase subdomains, regions involved inbinding duplex DNA. Indeed, judging from a recent high-resolutionstructure of a Taq-DNA complex (Y. L1, S. Korolev, G. Waksman, 1998,EMBO J. 17, 7514-25) four out of six residues mutated in H15 (K540,D578, N583, M747) directly contact either template or product strand (asshown in FIG. 7). H15 mutations appear to be neutral (or mutuallycompensating) as far as affinity for duplex DNA is concerned (whilepresumably reducing affinity for heparin) (Table 2) (K_(D) for DNA isdetermined using BIAcore. Briefly, the 68-mer used in (M. Astalke, N. D.Grindley, C. M. Joyce, 1995, J. Biol. Chem. 270, 1945-54) isbiotinylated at the 5′ end and bound to a SA sensorchip and binding ofpolymerases is measured in 1× Taq buffer (see above) at 20° C. RelativeK_(D) values are estimated by the PCR ranking assay using decreasingamounts of template). The precise molecular basis of heparin inhibitionis not known, but our results strongly suggest overlapping (andpresumably mutually exclusive) binding sites for DNA and heparin in thepolymerase active site, lending support to the notion that heparinexerts its inhibitory effect by mimicking and competing with duplex DNAfor binding to the active site. Our observation that heparin inhibitionis markedly reduced under conditions of excess template DNA, (see,Clones are screened and ranked by a PCR assay. Briefly, 2 μl of inducedcells are added to 30 μl PCR mix and amplification of a 0.4 kb fragmentis assayed under selection conditions (e.g. increasing amounts ofheparin). Thermostability and heparin resistance of purified His taggedwt and mutant Taq clones is determined as in (F. C. Lawyer et al., 1993,PCR Methods Appl. 2, 275-87; F. C. Lawyer et al., 1989, J. Biol. Chem.264, 6427-37) using activated salmon sperm DNA and normalized enzymeconcentrations, Table 2) appears consistent with this hypothesis.

TABLE 2 Properties of Selected Taq Clones Heparin 5′-3′ Taq T_(1/2)(97.5° C.) resistance K_(D) k_(cat) K_(M-dTTP) exo Mutation clone (min)(units/ml) (nM⁻¹) (s⁻¹) (μM) activity Rate^(§) Taq* n.d. n.d.  0.6***0.8^(†) 4.0^(‡) 43.2 n.d. 1.1 Taq_(wt) 1.5**   90**  0.6*** 0.8 9.0 45.01 1 T8 16.5** n.d.  0.3*** 1.2 8.8 48.6 0.2 1.2 H15 0.3*** 1750* 84***0.79 6.8 47.2 1.5 0.9 *commercial Taq preparation (HT Biotechnology),**with N-terminal His₆ tag, measured by CTP³² incorporation into salmonsperm DNA, ***no tag, measured by PCR assay, ^(†)Taq, published value: 1nM⁻¹ (1), Klenow (Cambio), 4 nM⁻¹, ^(‡) E. coli DNA Pol I, publishedvalue: 3.8 s⁻¹ (A. H. Polesky, T. A. Steitz, N. D. Grindley, C. M.Joyce, 1990, J. Biol. Chem. 265, 14579-91), ^(§)in relation to Taq_(wt)measured by mutS ELISA (Genecheck) (P. Debbie et al., 1997, NucleicAcids Res. 25, 4825-4829), Pfu (Stratagene): 0.2.

Example 16 Template Evolution in Emulsion Selection

A classic outcome of in vitro replication experiments is an adaptationof the template sequence towards more rapid replication (S. Spiegelman,1971, Q. Rev. Biophys. 4, 213-253). Indeed, we also observe templateevolution through silent mutations. Unlike the coding mutations (AT toGC vs. GC to AT/29 vs. 16), non-coding mutations display a striking bias(AT to GC vs. GC to AT/0 vs. 42) towards decreased GC content, generallythought to promote more efficient replication by facilitating strandseparation and destabilizing secondary structures. Apart from selectingfor adaptation, our method may also select for adaptability; i.e.polymerases might evolve towards an optimal, presumably higher, rate ofself-mutation (M. Eigen, 1971, Naturwissenschaften 58, 465-523). Indeed,mutators can arise spontaneously in asexual bacterial populations underadaptive stress (F. Taddei et al., 1997, Nature 387, 700-2; P. D.Sniegowski, P. J. Gerrish, R. E. Lenski, 1997, Nature 387, 703-5). Byanalogy, it could be argued that our method might favour polymerasevariants that are more error-prone and hence capable of faster adaptiveevolution. However, none of the selected polymerases displayed increasederror rates (Table 2). Eliminating recombination and decreasing themutational load during our method cycle may increase selective pressurestowards more error-prone enzymes.

Example 17 Assay for Heparin Tolerance of Polymerases

Heparin tolerance of polymerases is assayed using a similar assay tothat for thermal stability. Heparin is serially diluted into theactivity buffer (0-320 units/45 μl) and 5 μl of enzyme in the standardPCR mixture above are added. Reactions are incubated and incorporationassayed as above.

Example 18 Selection for Taq Variants with Increased Ability to Extendfrom a 3′ Mismatched Base

The primers used are Primer 9 (LMB388ba5WA) and Primer 10 (8fo2WC). Thisprimer combination presents polymerase variants with a 3′ purine-purinemismatch (A-G), and a 3′ pyrimidine-pyrimidine mismatch (C-C). These arethe mismatches least tolerated by Taq polymerase (Huang et al., 1992,Nucleic Acids Res. 20 (17), 4567-73) and are poorly extended.

The selection protocol is essentially the same as before, except thatthese two primers are used in emulsion. Extension time is also increasedto 8 minutes. After two rounds of selection, 7 clones are isolated whichdisplay up to a 16-fold increase in extension off the mismatch as judgedby a PCR ranking assay (see example 2: using primers 5 and 11) andstandardised for activity using the normal primer pair. These clones aresubsequently shuffled back into the original L1* and L2* libraries alongwith wild-type Taq and the selection process repeated, albeit with alower number of cycles (10) during the CSR reaction. This round ofselection yielded numerous clones, the best of which displayed up to32-fold increase in mismatch extension as judged by PCR (see example 2)using primers 5 and 11.

Incorporation of an incorrect base pair by Taq polymerase can stall thepolymerisation process as certain mismatches (see above) are poorlyextended by Taq. As such, Taq polymerase alone cannot be used in theamplification of large (>6 Kb) templates (Barnes). This problem can beovercome by supplementing Taq with a polymerase that has a 3′-5′exonuclease activity (eg Pfu polymerase) that removes incorrectlyincorporated bases and allows resumption of polymerisation by Taq. Theclones above are therefore investigated for their ability to carry outamplification of large DNA fragments (long-distance PCR) from a lambdaDNA template, as incorporation of an incorrect base would not beexpected to stall polymerisation. Using primers 12 (LBA23) and 13(LF046) (1 uM each) in a 50 ul PCR reaction containing 3 ng lambda DNA(New England Biolabs) dNTPs (0.2 mM), 1×PCR buffer (HT Biotech) clone M1is able to amplify a 23 Kb fragment using 20 repetitions of a 2-stepamplification cycle (94° C., 15 seconds; 68° C., 25 minutes). Wild-typepolymerase is unable to extend products above 13 kb using the samereaction buffer. Commercial Taq (Perkin Elmer) could not extend beyond 6kb using buffer supplied by the manufacturer.

Example 19 Selection Using Self-Sustained Sequence Replication (3SR)

To demonstrate the feasibility of 3SR within emulsion, the Taqpolymerase gene is first PCR-amplified from the parent plasmid (seeexample 1) using a forward primer that is designed to incorporate a T7RNA polymerase promoter into the PCR product. A 250 μl 3SR reaction mixcomprising the modified Taq gene (50 ng), 180 units T7 RNA polymerase(USB, 63 units reverse transcriptase (HT Biotech), rNTPs (12.5 mM),dNTPs (1 mM), MgCl₂ (10 mM), primer Taqba2T7 (primer 12; 125 pmoles),primer 88fo2 (primer 4; 125 pmoles), 25 mM Tris-HCl (pH 8.3), 50 mM KCl,and 2.0 mM DTT is made. 200 μl of this is emulsified using the standardprotocol. After prolonged incubation at room temperature, amplificationof the Taq gene (representing a model gene size) within emulsion is seento take place as judged by standard gel-electrophoresis.

To further expand the scope of the method, the 3SR reaction is carriedout in an in-vitro transcription/translation extract (EcoPro, Novagen).The inactive Taq gene (see example 1) is amplified from parental plasmidusing primers 2 (TaqfoSal) and 12 (Taqba2T7). 100 ng (approx. 1×10¹⁰copies) is added to make up 100 ul of the aqueous phase comprisingEcoPro extract (70 ul), methionine (4 ul), reverse transcriptase (84units, HT Biotech), primer 12 (Taqba2T7, 2 uM), primer 13 (TaqfoLMB2, 2uM), dNTPs (250 uM). The aqueous phase is emulsified into 400 uloil-phase using the standard protocol. After incubation at 37° C.overnight, the emulsion is extracted using the standard protocol and theaqueous phase further purified using a PCR-purification column (Qiagen).Complete removal of primers is ensured by treating 5 ul of column eluatewith 2 μl ExoZap reagent (Stratagene). DNA produced in emulsion by 3SRis rescued by using 2 μl of treated column eluate in an otherwisestandard 50 ul PCR reaction using 20 cycles of amplification and primers6 (LMB, ref 2) and 12 (Taqba2T7). Compared to background (the controlreaction where reverse transcriptase is omitted from the 3SR reaction inemulsion), a more intense correctly sized band could be seen whenproducts are visualised using agarose gel electrophoresis. The 3SRreaction can therefore proceed in the transcription/translationextracts, allowing for the directed evolution of agents expressed inaqueous compartments.

WT Taq polymerase has limited reverse transcriptase activity (Perler etal., 1996, Adv. Protein Chem. 48, 377-435). It is also known thatreverse transcriptases (eg HIV reverse transcriptase that has bothreverse transcriptase and polymerase activities) are considerably moreerror prone than other polymerases. This raises the possibility that amore error-prone polymerase (where increased tolerance for non-cognatesubstrate is evident) might display increased reverse transcriptaseactivity. The genes for Taq variants M1, M4 as well as the inactivemutant are amplified from parental plasmids using primers 12 (Taqba2T7)and 2 (TaqfoSal) and the 3SR reaction is carried out as above in thetranscription/translation extract (Novagen) with the exception thatreverse transcriptase is not exogenously added. In control reactions,methionine is omitted from the reaction mix. After 3 hours incubation at37° C., the reaction is treated as above and PCR carried out usingprimer pair 6 and 12 to rescue products synthesised during the 3SRreaction. Of the clones tested, clone M4 gave a more intense correctlysized band compared to control reaction when products are visualisedusing agarose gel electrophoresis. Clone M4 would therefore appear topossess some degree of reverse transcriptase activity. This result showsthat it is possible to express functionally active replicases in vitro.When coupled to selection by compartmentalisation, novel replicasescould be evolved.

Selection of Agents Modifying Replicase Activity

Example 19 and the following Examples describes how the methods of ourinvention may be employed to select an enzyme which is involved in ametabolic pathway whose final product is a substrate for the replicase.These Examples show a method for selection of nucleoside diphosphatekinase (NDP Kinase), which catalyses the transfer of a phosphate groupfrom ATP to a deoxynucleoside diphosphate to produce a deoxynucleosidetriphosphate (dNTP). Here, the selectable enzyme (NDK) providessubstrates for Taq polymerase to amplify the gene encoding it. Thisselection method differs from the compartmentalized self-replication ofa replicase (CSR, Ghadessy and Holliger) in that replication is acoupled process, allowing for selection of enzymes (nucleic acids andprotein) that are not replicases themselves. Bacteria expressing NDK(and containing its gene on an expression vector) are co-emulsified withits substrate (in this case, dNDPs and ATP) along with the otherreagents needed to facilitate its amplification (Taq polymerase, primersspecific for the ndk gene, and buffer). Compartmentalization in awater-in-oil emulsion ensures the segregation of individual libraryvariants. Active clones provide the dNTPs necessary for Taq polymeraseto amplify the ndk gene. Variants with increased activity provide moresubstrate for its own amplification and hence post-selection copy numbercorrelates to enzymatic activity within the constraints of polymeraseactivity. Additional selective pressure arises from the minimum amountof dNTPs required for polymerase activity, hence clones with increasedcatalytic activity are amplified preferentially at the expense of poorlyactive variants (selection is for k_(cat) as well as K_(m)).

By showing that we can evolve an enzyme whose product feeds into thepolymerase reaction, we hope to eventually co-evolve multiple enzymeslinked through a pathway where one enzyme's product is substrate for thenext. Diversity could be introduced into two or more genes, and bothgenes could be co-transformed into the same expression host on plasmidsor phage. We hope to develop cooperative enzyme systems that enableselection for the synthesis of unnatural substrates and their subsequentincorporation into DNA.

Example 20 Induced Expression of NDP Kinase in Bacterial Cells

A pUC19 expression plasmid containing the EcoRI/HindIII restrictionfragment with the open reading frame of Nucleoside Diphosphate Kinasefrom Myxococcus Xanthus is cloned. Plasmid is prepared from an overnightculture and transformed into the ndk-, pykA-, pykF-strain of E. coliQL1387. An overnight culture of QL1387/pUC19ndk is grown in the presenceof chloramphenicol (10 μg/ml final concentration), ampicillin (100 μg/mlfinal concentration) and glucose (2%) for 14-18 hours. The overnightculture is diluted 1:100 in (2×TY, 10 μg/ml chloramphenicol, 100 μg/mlanipicillin and 0.1% glucose). Cells are grown to an O.D. (600 nm) of0.4 and induced with IPTG (1 mM final concentration) for 4 hours at 37°C. After protein induction, cells are washed once in SuperTaq buffer (10mM tris-HCL pH 9, 50 mM KCl, 0.1% Triton X-100, 1.5 mM MgCl2, HTBiotechnology) and resuspended in 1/10 volume of the same buffer. Thenumber of cells is quantified by spectrophotometric analysis with theapproximation of O.D.600 0.1=1×10⁸ cells/ml.

Example 21 Phosphoryl Transfer Reaction in Aqueous Compartments withinan Emulsion

To establish whether deoxynucleoside diphosphates can be phosphorylatedby NDP kinase in Taq buffer, a standard PCR reaction is carried out inwhich dNTPs are replaced by dNDPs and ATP, a donor phosphate molecule.Nucleoside diphosphate kinase is expressed from E. coli QL1387 (a ndkand pyruvate kinase deficient strain of E. coli) as described in theprevious example. Cells are mixed with the PCR reaction mix.

Washed cells are added to a PCR reaction mixture (approx. 8e5 cells/μlfinal concentration) containing SuperTaq buffer, 0.5 μM primers, 100 μMeach dNDP, 400 μM ATP, SuperTaq polymerase (0.1 unit/μl finalconcentration, HT Biotechnology).

After breaking open the cells at 65° C. for 10 min, incubating thereaction mixture for 10 minutes at 37° C., and thermocycling (15 cyclesof 94° C. 15 sec, 55° C. 30 sec, 72° C. 1 min 30 sec), amplifiedproducts are visualized on a standard 1.5% agarose/TBE gel stained withethidium bromide (Sambrook). The results of this experiment show thatexpressed NDP kinase can phosphorylate dNDPs to provide Taq polymerasewith substrates for the PCR amplification of the ndk gene.

The experiment is repeated, with the additional step of emulsifying thereaction mixture with mineral oil and detergent as described above. Itis found that NDP kinase is active within aqueous compartments of anemulsion.

Example 22 Compartmentalization of NDK Variants by Emulsification

The original emulsion mix allowed for the diffusion of small moleculesbetween compartments during thermocycling. However, by adjusting thewater to oil ratio and minimizing the thermocycling profile, theexchange of product and substrate between compartments is minimized,resulting in a tighter linkage of genotype to phenotype. Given thediffusion rates can be controlled by modifying the emulsion mix, it maybe possible to adjust buffer conditions after emulsification, possiblyallowing for greater control of selection conditions (i.e. adjusting pHwith the addition of acid or base, or starting/stopping reactions withthe addition of substrates or inhibitors).

150 μl of PCR reaction mix (SuperTaq buffer, 0.5 μM each primer, 100 μMeach dNDP, 400 μM ATP, 0.1 unit/μl Taq polymerase, 8×10⁵ cells/μl ofQL1387/ndk) are added dropwise (1 drop/5 sec) to 450 μl oil phase(mineral oil) in the presence of 4.5% v/v Span 80, 0.4% v/v Tween 80 and0.05% v/v Triton X-100 under constant stirring in a 2 ml round bottombiofreeze vial (Corning). After addition of the aqueous phase, stirringis continued for an additional 5 minutes. Emulsion reactions arealiquoted (100 μl) into thin-walled PCR tubes and thermocycled asindicated above.

Recovery of amplified products after emulsification is carried out asfollows. After thermocycling, products are recovered by extraction with2 volumes of diethyl ether, vortexed, and centrifuged for 10 minutes ina tabletop microfuge. Amplification products are analyzed as before.

Example 23 Minimizing Background Kinase Activity

Background kinase activity levels are determined by emulsifying E. coliTG1 cells in Taq buffer with substrates, as described above. It is foundthat native nucleoside diphosphate kinase from E. coli retained enoughactivity after the initial denaturation to provide significant kinaseactivity in our assay. The pUC19 expression plasmid containing the ndkgene is transformed into a ndk deficient strain of E. coli QL1387.Compared to a catalytic knockout mutant of mx ndk (H117A), thebackground kinase activity is determined to be negligible in our assay(amplified products could not be visualized by agarose gelelectrophoresis) when ndk is expressed from the knockout strain.

Example 24 Maintenance of the Genotype-Phenotype Linkage in Emulsion

A catalytic knockout mutation (NDK H117A) of NDP kinase is co-emulsifiedwith wild-type NDP kinase in equal amounts. The inactive mutant of ndkis distinguished by a smaller amplification product, since the 5′ and 3′regions flanking the ORF downstream from the priming sites are removedduring construction of the knockout mutant. Our emulsification proceduregives complete bias towards amplification of the active kinase, asdetermined by agarose gel electrophoresis.

Example 25 Method for the Parallel Genotyping of HeterogenousPopulations of Cells

The approach involves compartmentation of the cells in question in theemulsion (see WO9303151) together with PCR reagents etc. and polymerase.However, instead of linking genes derived from one cell by PCR assembly,one (or several) biotinylated primers are used as well as a streptavidincoated polystyrene beads (or any other suitable means of linking primersonto beads). Thus, PCR fragments from one single cell are transferred toa single bead. Beads are pooled, interrogated for presence of a certainmutation or allele using fluorescently labelled probes (as described for“Digital PCR”) and counted by FACS. Multiplex PCR allows thesimultaneous interrogation of 10 or maybe more markers. Single beads canalso be sorted for sequencing.

Applications include, for example, diagnosis of asymptomatic tumors,which hinge on the detection of a very small number of mutant cells in alarge excess of normal cells. The advantage of this method overcytostaining is through-put. Potentially 10⁸-10⁹ cells can beinterrogated simultaneously.

Example 25 Short-Patch CSR

The present example relates to the selection of polymerases with lowcatalytic activity or processivity. Compartmentalized Self-Replication(CSR), as described, is a method of selecting polymerase variants withincreased adaptation to distinct selection conditions. Mutants withincreased catalytic activity have a selective advantage over ones thatare less active under the selection conditions. However, for manyselection objectives (e.g. altered substrate specificity) it is likelythat intermediates along the evolutionary pathway to the new phenotypewill have lowered catalytic activity. For example, from kinetic studiesof E. coli DNA polymerase I, mutations such as E710A increased affinityand incorporation of ribonucleotides at the expense of lower catalyticrates and less affinity for wild-type substrates (deoxyribonucleotides)(F. B. Perler, S. Kumar, H. Kong, 1996, Adv. in Prot. Chem. 48,377-430). The corresponding mutant of Taq DNA polymerase I, E615A, couldincorporate ribonucleotides into PCR products more efficiently thanwild-type polymerase. However, using wild-type substrates, it is onlyable to synthesize short fragments and not the full-length Taq gene, asanalyzed by agarose gel electrophoresis. Therefore it would be difficultto select for this mutation by CSR. In another selection experiment inwhich Beta-glucuronidase is evolved into a β-galactosidase, the desiredphenotype is obtained after several rounds of selection but at theexpense of catalytic activity. It is also found that selected variantsin the initial rounds of selection are able to catalyze the conversionof several different substrates not utilized by either parental enzyme,and at much lower catalytic rates (T. A. Steitz, 1999, J. Biol. Chem.274, 17395-8).

In order to address the problem of being able to select polymerasevariants with low catalytic activity or processivity such as may occuralong an evolutionary trajectory to a desired phenotype, a variant ofCSR, in which only a small region (a “patch”) of the gene underinvestigation is randomized and replicated, is employed. The techniqueis referred to as “short-patch CSR” (spCSR). spCSR allows for lessactive or processive polymerases to still become enriched during a roundof selection by decreasing the selective advantage given to highlyactive or processive mutants. This method expands on the previouslydescribed method of compartmentalized self-replication, but, because theentire gene is not replicated, the short patch method is also useful forexample for investigating specific domains independent of the rest ofthe protein.

There are many ways to introduce localised diversity into a gene, amongthese are error-prone PCR (using manganese or synthetic bases, asdescribed above for the Taq polymerase library), DNA shuffling (C. A.Brautigani, T. A. Steitz, 1998, Curr. Opin. Struct. Biol. 8, 54-63; Y.L1, S. Korolev, G. Waksman, 1998, EMBO J. 17, 7514-25), cassettemutagenesis (E. Bedford, S. Tabor, C. C. Richardson, 1997, Proc. Natl.Acad. Sci. USA 94, 479-84), and degenerate oligonucleotide directedmutagenesis (Y. L1, V. Mitaxov, G. Waksman, 1999, Proc. Natl. Acad. Sci.USA 96, 9491-6; M. Suzuki, D. Baskin, L. Hood, L. A. Loeb, 1996, Proc.Natl. Acad. Sci. USA 93, 9670-5) and its variants, e.g. sticky feetmutagenesis (J. L. Jestin, P. Kristensen, G. Winter, 1999, Angew. Chem.Int. Ed. 38, 1124-1127), and random mutagenesis by whole plasmidamplification (T. Oberholzer, M. Aibrizio, P. L. Luisi, 1995, Chem.Biol. 2, 677-82). Combinatorial alanine scanning (A. T. Haase, E. F.Retzel, K. A. Staskus, 1990, Proc. Natl. Acad. Sci. USA 87, 4971-5) maybe used to generate library variants to determine which amino acidresidues are functionally important.

Structural (M. J. Embleton, G. Gorochov, P. T. Jones, G. Winter, 1992,Nucleic Acids Res. 20, 3831-7), sequence alignment (D. S. Tawfik, A. D.Griffiths, 1998, Nat. Biotechnol. 16, 652-656), and biochemical datafrom DNA polymerase I studies reveal regions of the gene involved innucleotide binding and catalysis. Several possible regions to targetinclude regions 1 through 6, as discussed in D. S. Tawfik, A. D.Griffiths, 1998, Nat. Biotechnol. 16, 652-656 (regions 3, 4, and 5 arealso referred to as Motif A, B, and C, respectively, in Taq DNApolymerase I). Other possible targeted regions would be those regionsconserved across several diverse species, those implicated by structuraldata to contact the nucleotide substrate or to be involved in catalysisor in proximity to the active site, or any other region important topolymerase function or substrate binding.

During a round of selection, each library variant is required toreplicate only the region of diversity. This can be easily achieved byproviding primers in a PCR reaction which flank the region diversified.CSR selections would be done essentially as described. After CSRselection the short region which is diversified and replicated now isreintroduced into the starting gene (or another genetic framework e.g. alibrary of mutants of the parent gene, a related gene etc.) using eitherappropriately situated restriction sites or PCR recombination methodslike PCR shuffling or Quickchange mutagenesis etc. The spCSR cycle maybe repeated many times and multiple regions could be targetedsimultaneously or iteratively with flanking primers either amplifyingindividual regions separately or inclusively.

To increase stringency in selections at a later stage spCSR is tunablesimply by increasing the length of replicated sequence as defined by theflanking primers up to full length CSR. Indeed, for selection forprocessivity i.a. it may be beneficial to extend the replicated segmentbeyond the encoding gene to the whole vector using strategies analogousto iPCR (inverted PCR).

spCSR can have advantages over full length CSR not only when looking forpolymerase variants with low activities or processivities but also whenmapping discrete regions of a protein for mutability, e.g. inconjunction with combinatorial alanine scanning (A. T. Haase, E. F.Retzel, K. A. Staskus, 1990, Proc. Natl. Acad. Sci. USA 87, 4971-5) todetermine which amino acid residues are functionally important. Suchinformation may be useful at a later stage to guide semi-rationalapproaches, i.e. to target diversity to residues/regions not involved incore polymerase activity. Furthermore spCSR may be used to transplantpolypeptide segments between polymerases (as with immunoglobulin CDRgrafting). A simple swap of segments may lead initially to poorly activepolymerases because of steric clashes and may require “reshaping” tointegrate segments functionally. Reshaping may be done using either fulllength CSR (e.g. from existing random mutant libraries) or spCSRtargeted to secondary regions (“Vernier zone” in antibodies).

Short patches may also be located at either N- or C-terminus asextensions to existing polymerase gene sequences or as internalinsertions. Precedents for such phenotype modifying extensions andinsertions exist in nature. For example both a C-terminal extension ofT5 DNA pol and the thioredoxin-binding insertion in T7 DNA pol arecritical for processivity in these enzymes and enable them toefficiently replicate the large (>30 kb) T-phage genomes. N- orC-terminal extensions have also been shown to enhance activity in otherenzymes.

Example 26 Low Temperature CSR Using Klenow Fragment

Klenow fragment was cloned from E. coli genomic DNA into expressionvector pASK75 (as with Taq) and expressed in E. coli strain DH5αZ1 (LutzR. and Bujard H., 1997, Nucleic Acids Res. 25, 1203). Cells were washedand resuspended in 10 mM Tris pH7.5. 2×10⁸ resuspended cells (20 μl)were added to 200 μl low temperature PCR buffer (LTP) (Iakobashvili, R.and Lapidot, A., 1999, Nucleic Acids Res. 27, 1566) and emulsified asdescribed (Ghadessy et al., 2001, PNAS 98, 4552). LTP was 10 mM Tris(pH7.5), 5.5M L-proline, 15% w/v glycerol, 15 mM MgCl2+suitable primers(because proline lowers melting temperature, primers need to be 40-mersor longer) and dNTP's and emulsified as described. Low temperature PCRcycling was 70° C. 10 min, 50× (70° C. 30 sec, 37° C. 12 min). Aqueousphase was extracted as described and puried selection productsreamplified as described (Ghadessy et al., 2001, PNAS 98, 4552).

All publications mentioned in the above specification are hereinincorporated by reference. Various modifications and variations of thedescribed methods and system of the invention will be apparent to thoseskilled in the art without departing from the scope and spirit of theinvention. Although the invention has been described in connection withspecific preferred embodiments, it should be understood that theinvention as claimed should not be unduly limited to such specificembodiments. Indeed, various modifications of the described modes forcarrying out the invention which are apparent to those skilled inmolecular biology or related fields are intended to be within the scopeof the following claims.

TABLE 3 Primer Sequences Used in Examples Primer Designation Sequence(5′ to 3′) Primer 1 TaqbaXba GGCGACTCTAGATAACGAGGGCAAAAAATG (SEQ IDCGTGGTATGCTTCCTCTTTTTGAGCCCAAGGG NO: 1) Primer 2 TaqfoSalGCGGTGCGGAGTCGACTCACTCCTTGGCGGA (SEQ ID GAGCCAGTCCTC NO: 2) Primer 388ba4 AAAAATCTAGATAACGAGGGCAA (SEQ ID NO: 3) Primer 4 88fo2ACCACCGAACTGCGGGTGACGCCAAGCG (SEQ ID NO: 4) Primer 5 Taqba(scr)GGGTACGTGGAGACCCTCTTCGGCC (SEQ ID NO: 5) Primer 6 LMB2 GTAAAACGACGGCCAGT(SEQ ID NO: 6) Primer 7 LMB3 CAGGAAACAGCTATGAC (SEQ ID NO: 7) Primer 888ba4LMB3 CAGGAAACAGCTATGACAAAAATCTAGATAA (SEQ ID CGAGGGCAA NO: 8)Primer 9 88fo2LMB2 GTAAAACGACGGCCAGTACCACCGAACTGCG (SEQ IDGGTGACGCCAAGCG NO: 9) Primer 10 LMB388ba5 CAG GAA ACA GCT ATG ACA AAAATC TAG (SEQ ID WA ATA ACG AGG GA(A-G mismatch) NO: 10) Primer 11 8fo2WCGTA AAA CGA CGG CCA GTA CCA CCG AAC (SEQ ID TGC GGG TGA CGC CAA GCC(C-Cmismatch) NO: 11) Primer 12 LBA23 GAGTAGATGCTTGCTT TTCTGAGCC (SEQ ID NO:12) Primer 13 LF046 GCTCTGGT TATCTGCATC ATCGTCTGCC (SEQ ID NO: 13)

SEQUENCES

Thermostable Clone T7-88: Nucleotide SequenceAACCTTGGTATGCTTCCTCTTTTTGAGCCCAAGGGTCGCGTCCTCCTGGTGGACGGCC (SEQ ID NO:14) ACCACCTGGCCTACCGCACCTTCCACGCCCTGAAGGGCCTCACCACCAGCCGGGGGGAGCCGGTGCAGGCGGTCTACGGCTTCGCCAAGAGCCTCCTCAAGGCCCTCAAGGAGGACGGGGACGCGGTGATCGTGGTCTTTGACGCCAAGGCCCCCTCCTCCCGCCACGAGGCCTACGGGGGGTACAAGGCGGGCCGGGCCCCCACGCCGGAGGACTTTCCCCGGCAACTCGCCCTCATCAAGGAGCTGGTGGACCTCCTGGGGCTGGCGCGCCTCGAGGTCCCGGGCTACGAGGCGGACGACGTCCTGGCCAGCCTGGCCAAGAAGGCGGAAAAGGAGGGCTACGAGGTCCGCATCCTCACCGCCGACAAAGACCTTTACCAGCTCCTTTCCGACCGCATCCACGTCCTCCACCCCGAGGGGTACCTCATCACCCCGGCCTGGCTTTGGGAAAAGTACGGCCTGAGGCCCGACCAGTGGGCCGACTACCGGGCCCTGACCGGGGACGAGTCCGACAACCTTCCCGGGGTCAAGGGCATCGGGGAGAAGACGGCGAAGAAGCTTCTGGAGGAGTGGGGGAGCCTGGAAGCCCTCCTCGAGAACCTGGACCGGCTGAAGCCCGCCATCCGGGAGAAGATCCTGGCCCACACGGACGATCTGAAGCTCTCCTGGGACCTGGCCAAGGTGCGCACCGACCTGCCCCTGGAGGTGGACTTCGCCAAAAGGCGGGAGCCCGACCGGGAGAGGCTTAGGGCCTTTCTGGAGAGGCTTGAGTTTGGCAGCCTCCTCCACGAGTTCGGCCTTCTGGAAAGCCCCAAGGCCCTGGAGGAGGCCCCCTGGCCCCCGCCGGAAGGGGCCTTCGTGGGCTTTGTGCTTTCCCGCAAGGAGCCCATGTGGGCCGATCTTCTGGCCCTGGCCGCCGCCAGGGGGGGCCGGGTCCACCGGGCCCCCGAGCCTTATAAAGCCCTCAGGGACTTGAAGGAGGCGCGGGGGCTTCTCGCCAAAGACCTGAGCGTTCTGGCCCTAAGGGAAGGCCTTGGCCTCCCGCCCGGCGACGACCCCATGCTCCTCGCCTACCTCCTGGACCCTTCCAACACCACCCCCGAGGGGGTGGCCCGGCGCTACGGCGGGGAGTGGACGGAGGAGGCGGGGGAGCGGGCCGCCCTTTCCGAGAGGCTCTTCGCCAACCTGTGGGGGAGGCTTGAGGGGGAGGAGAGGCTCCTTTGGCTTTACCGGGAGGTGGAGAGGCCCCTTTCCGCTGTCCTGGCCCACATGGAGGCCACGGGGGTGCGCCTGGACGTGGCCTATCTCAGGGCCTTGTCCCTGGAGGTGGCCGAGGAGATCGCCCGCCTCGAGGCCGAGGTCTTCCGCCTGGCCGGCCACCCCTTCAACCTCAACTCCCGGGACCAGCTGGAAAGGGTCCTCTTTGACGAGCTAGGGCTTCCCGCCATCGGCAAGACGGAGAAGACCGGCAAGCGCTCCACCAGCGCCGCCGTCCTGGAGGCCCTCCGCGAGGCCCACCCCATCGTGGAGAAGATCCTGCAGTACCGGGAGCTCACCAAGCTGAAGAGCACCTACATTGACCCCTTGCCGGACCTCATCCACCCCAGGACGGGCCGCCTCCACACCCGCTTCAACCAGACGGCCACGGCCACGGGCAGGCTAAGTAGCTCCGATCCCAACCTCCAGAACATCCCCGTCCGCACCCCGCTTGGGCAGAGGATCCGCCGGGCCTTCATCGCCGAGGAGGGGTGGCTATTGGTGGTCCTGGACTATAGCCAGATAGAGCTCAGGGTGCTGGCCCACCTCTCCGGCGACGAGAACCTGATCCGGGTCTTCCAGGAGGGGCGGGACATCCACACGGAAACCGCCAGCTGGATGTTCGGCGTCCCCCGGGAGGCCGTGGACCCCCTGATGCGCCGGGCGGCCAAGACCATCAACTTCGGGGTTCTCTACGGCATGTCGGCCCACCGCCTCTCCCAGGAGCTAGCCATCCCTTACGAGGAGGCCCAGGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAGGAGGGCAGGAGGCGGGGGTACGTGGAGACCCTCTTCGGCCGTCGCCGCTACGTGCCAGACCTAGAGGCCCGGGTGAAGAGCGTGCGGGAGGCGGCCGAGCGCATGGCCTTCAACATGCCCGTCCAGGGCACCGCCGCCGACCTCATGAAGCTGGCTATGGTGAAGCTCTTCCCCAGGCTGGAGGAAATGGGGGCCAGGATGCTCCTTCAGGTCCACGACGAGCTGGTCCTCGAGGCCCCAAAAGAGAGGGCGGAGGCCGTGGCCCGGCTGGCCAAGGAGGTCATGGAGGGGGTGTATCCCCTGGCCGTGCCCCTGGAGGTGGAGGTGGGGATAGGGGAGGACTGGCTCTCTGCCAAGGAGTGAG Thermostable Clone T7-88: Amino Acid SequenceMLPLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDA (SEQ ID NO:15) VIVVFDAKAPSSRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTAKKLLEEWGSLEALLENLDRLKPAIREKILAHTDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKALEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVAYLRALSLEVALEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLEALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSDPNLQNIPVRTPLGQRIRRAFIAEEGWLLVVLDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGED WLSAKE*Thermostable Clone T9: Nucleic Acid SequenceGATGCTCCCTCTTTTTGAGCCCAAGGGTCGCGTCCTCCTGGTGGACGGCCACCACCT (SEQ ID NO:16) GGCCTACCGCACCTTCCACGCCCTGAAGGGCCTCACCACCAGCCGGGGGGAGCCGGTGCAGGCGGTCTACGGCTTCGCCAAGAGCCTCCTCAAGGCCCTCAAGGAGGACGGGGACGCGGTGATCGTGGTCTTTGACGCCAAGGCCCCCTCCTTCCGCCACGAGGCCTACGGGGGGTACAAGGCGGGCCGGGCCCCCACGCCGGAGGACTTTCCCCGGCAACTCGCCCTCATCAAGGAGCTGGTGGACCTCCTGGGGCTGGCGCGCCTCGAGGTCCCGGGCTACGAGGCGGACGACGTCCTGGCCAGCCTGGCCAAGAAGGCGGAAAAGGAGGGCTACGAGGTCCGCATCCTCACCGCCGACAAAGACCTTTACCAGCTCCTTTCCGACCGCATCCACGTCCTCCACCCCGAGGGGTACCTCATCACCCCGGCCTGGCTTTGGGAAAAGTACGGCCTGAGGCCCGACCAGTGGGCCGACTACCGGGCCCTGACCGGGGACGAGTCCGACAACCTTCCCGGGGTCAAGGGCATCGGGGAGAAGACGGCGAGGAAGCTTCTGGAGGAGTGGGGGAGCCTGGAAGCCCTCCTCAAGAACCTGGACCGGCTGAAGCCCGCCATCCGGGAGAAGATCCTGGCCCACATGGACGATCTGAAGCTCTCCTGGGACCTGGCCAAGGTGCGCACCGACCTGCCCCTGGAGGTGGACTTCGCCAAAAGGCGGGAGCCCGACCGGGAGAGGCTTGGGCCTTTCTGGAGAGGCTTGAGCTTGGCAGCCTCCTCCACGAGTTCGGCCTTCTGGAAAGCCCCAAGGCCCTGGAGGAGGCCTCCTGGCCCCCGCCGGAAGGGGCCTTCGTGGGCTTTGTGCTTTCCCGCAAGGAGCCCATGTGGGCCGATCTTCTGGCCCTGGCCGCCGCCAGGGGGGGCCGGGTCCACCGGGCCCCCGAGCCTTATAAAGCCCTCAGAGACCTGAAGGAGGCGCGGGGGCTTCTCGCCAAAGACCTGAGCGTTCTGGCCCTGAGGGAAGGCCTTGGCCTCCCGCCCGGCGACGACCCCATGCTCCTCGCCTACCTCCTGGACCCTTCCAACACCACCCCCGAGGGGGTGGCCCGGCGCTACGGCGGGGAGTGGACGGAGGAGGCGGGGGAGCGGGCCGCCCTTTCCGAGAGGCTCTTCGCCAACCTGTGGGGGAGGCTTGAGGGGGAGGAGAGGCTCCTTTGGCTTTACCGGGAGGTGGAGAGGCCCCTTTCCGCTGTCCTGGCCCACATGGAGGCCACGGGGGTGCGCCTGGACGTGGCCTATCTCAGGGCCTTGTCCCTGGAGGTGGCCGAGGAGATCGCCCGCCTCGAGGCCGAGGTCTTCCGCCTGGCCGGCCACCCCTTCAACCTCAACTCCCGAGACCAGCTGGAAAGGGTCCTCTTTGACGAGCTAGGGCTTCCCGCCATCGGCAAGACGGAGAAGACCGGCAAGCGCTCCACCAGCGCCGCCGTCCTGGAGGCCCTCCGCGAGGCCCACCCCATCGTGGAGAAGATCCTGCAGTACCGGGAGCTCACCAAGCTGAAGAGCACCTACATTGACCCCTTGCCGGACCTCATCCACCCCAGGACGGGCCGCCTCCACACCCGCTTCAACCAGACGGCCACGGCCACGGGCAGGCTAAGTAGCTCCGATCCCAACCTCCAGAACATCCCCGTCCGCACCCCGCTTGGGCAGAGGATCCGCCGGGCCTTCATCGCCGAGGAGGGGTGGCTATTGGTGGCCCTGGACTATAGCCAGATAGAGCTCAGGGTGCTGGCCCACCTCTCCGGCGACGAGAACCTGATCCGGGTCTTCCAGGAGGGGCGGGACATCCACACGGAGACCGCCAGCTGGATGTTCGGCGTCCCCCGGGAGGCCGTGGACCCCCTGATGCGCCGGGCGGCCAAGACCATCAACTTCGGGGTCCTCTACGGCATGTCGGCCACCGCCTCTCCCAGGAGCTAGCCATCCCTTACGAGGAGGCCCAGGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAGGAGGGCAGGAGGCGGGGGTACGTGGAGACCCTCTTCGGCCGCCGCCGCTACGTGCCAGACCTAGAGGCCCGGGTGAAGAGCGTGCGGGAGGCGGCCGAGCGCATGGCCTTCAACATGCCCGTCCAGGGCACCGCCGCCGACCTCATGAAGCTGGCTATGGTGAAGCTCTTCCCCAGGCTGGAGGAAATGGGGGCCAGGATGCTCCTTCAGGTCCACGACGAGCTGGTCCTCGAGGCCCCAAAAGAGAGGGCGGAGGCCGTGGCCCGGCTGGCCAAGGAGGTCATGGAGGGGGTGTATCCCCTGGCCGTGCCCCTGGAGGTGGAGGTGGGGATAGGGGAGGACTGGCTCTCCGCCAAGGAGGGAGTCGACCTGCAGGCAGCGCTTGGCGTCACCCGCAGTTCGGTGGTACTGGCCGTCGTTTTACANN Thermostable Clone T9: Amino Acid SequenceMLPLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDA (SEQ ID NO:17) VIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALLKNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLELGSLLHEFGLLESPKALEEASWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVAYLRALSLEVAEEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLEALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSDPNLQNIPVRTPLGQRIRRAFIAEEGWLLVALDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGE DWLSAKEThermostable Clone T13: Amino Acid SequenceMLPLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDA (SEQ ID NO:18) VIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTAKKLLEEWGSLEALLENLDRLKPAIREKILAHTDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKALEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVAYLRALSLEVAEEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLEALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSDPNLQNIPVRTPLGQRIRRAFIAEEGWLLVVLDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGED WLSAKEThermostable Clone 8 (T8): Nucleic Acid SequenceTCGTGGTACGCATCCTCTTTTTGAGCCCAAGGGCCGCGTCCTCCTGGTGGACGGCCA (SEQ ID NO:19) CCACCTGGCCTACCGCACCTTCCACGCCCTGAAGGGCCTCACCACCAGCCGGGGGGAGCCGGTGCAGGCGGTCTACGGCTTCGCCAAGAGCCTCCTCAAGGCCCTCAAGGAGGACGGGGACGCGGTGATCGTGGTCTTTGACGCCAAGGCCCCCTCCTCCCGCCACGAGGCCTACGGGGGGTACAAGGCGGGCCGGGCCCCCACGCCGGAGGACTTTCCCCGGCAACTCGCCCTCATCAAGGAGCTGGTGGACCTCCTGGGGCTGGCGCGCCTCGAGGTCCCGGGCTACGAGGCGGACGACGTCCTGGCCAGCCTGGCCAAGAAGGCGGAAAAGGAGGGCTATGAGGTCCGCATCCTCACCGCCGACAAAGACCTTTACCAGCTCCTTTCCGACCGCATCCACGTCCTCCACCCCGAGGGGTACCTCATCACCCCGGCCTGGCTTTGGGAAAAGTACGGCCTGAGGCCCGACCAGTGGGCCGACTACCGGGCCCTGACCGGGGACGAGTCCGACAACCTTCCCGGGGTCAAGGGCATCGGGGAGAAGACGGCGAAGAAGCTTCTGGAGGAGTGGGGGAGCCTGGAAGCCCTCCTCGAGAACCTGGACCGGCTGAAGCCCGCCATCCGGGAGAAGATCCTGGCCCACACGGACGATCTGAAGCTCTCCTGGGACCTGGCCAAGGTGCGCACCGACCTGCCCCTGGAGGTGGACTTCGCCAAAAGGCGGGAGCCCGACCGGGAGAGGCTTAGGGCCTTTCTGGAGAGGCTTGAGTTTGGCAGCCTCCTCCACGAGTTCGGCCTTCTGGAAAGCCCCAAGGCCCTGGAGGAGGCCCCCTGGCCCCCGCCGGAAGGGGCCTTCGTGGGCTTTGTGCTTTCCCGCAAGGAGCCCATGTGGGCCGATCTTCTGGCCCTGGCCGCCGCCAGGGGTGGCCGGGTCCACCGGGCCCCCGAGCCTTATAAAGCCCTCAGGGACTTGAAGGAGGCGCGGGGGCTTCTCGCCAAAGACCTGAGCGTTCTGGCCCTAAGGGAAGGCCTTGGCCTCCCGCCCGGCGACGACCCCATGCTCCTCGCCTACCTCCTGGACCCTTCCAACACCACCCCCGAGGGGGTGGCCCGGCGCTACGGCGGGGAGTGGACGGAGGAGGCGGGGGAGCGGGCCGCCCTTTCCGAGAGGCTCTTCGCCAACCTGTGGGGGAGGCTTGAGGGGGAGGAGAGGCTCCTTTGGCTTTACCGGGAGGTGGATAGGCCCCTTTCCGCTGTCCTGGCCCACATGGAGGCCACAGGGGTGCGCCTGGACGTGGCCTATCTCAGGGCCTTGTCCCTGGAGGTGGCCGAGGAGATCGCCCGCCTCGAGGCCGAGGTCTTCCGCCTGGCCGGCCACCCCTTCAACCTCAACTCCCGGGACCAGCTGGAAAGGGTCCTCTTTGACGAGCTAGGGCTTCCCGCCATCGGCAAGACGGAGAAGACCGGCAAGCGCTCCACCAGCGCCGCCGTCCTGGAGGCCCTCCGCGAGGCCCACCCCATCGTGGAGAAGATCCTGCAGTACCGGGAGCTCACCAAGCTGAAGAGCACCTACATTGACCCCTTGCCGGACCTCATCCACCCCAGGACGGGCCGCCTCCACACCCGCTTCAACCAGACGGCCACGGCCACGGGCAGGCTAAGTAGCTCCGATCCCAACCTCCAGAACATCCCCGTCCGCACCCCGCTTGGGCAGAGGATCCGCCGGGCCTTCATCGCCGAGGAGGGGTGGCTATTGGTGGTCCTGGACTATAGCCAGATAGAGCTCAGGGTGCTGGCCCACCTCTCCGGCGACGAGAACCTGATCCGGGTCTTCCAGGAGGGGCGGGACATCCACACGGAAACCGCCAGCTGGATGTTCGGCGTCCCCCGGGAGGCCGTGGACCCCCTGATGCGCCGGGCGGCCAAGACCATCAACTTCGGGGTTCTCTACGGCATGTCGGCCCACCGCCTCTCCCAGGAGCTAGCCATCCCTTACGAGGAGGCCCAGGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAGGAGGGCAGGAGGCGGGGGTACGTGGAGACCCTCTTCGGCCGCCGCCGCTACGTGCCAGACCTAGAGGCCCGGGTGAAGAGCGTGCGGGAGGCGGCCGAGCGCATGGCCTTCAACATGCCCGTCCAGGGCACCGCCGCCGACCTCATGAAGCTGGCTATGGTGAAGCTCTTCCCCAGGCTGGAGGAAATGGGGGCCAGGATGCTCCTTCAGGTCCACGACGAGCTGGTCCTCGAGGCCCCAAAAGAGAGGGCGGAGGCCGTGGCCCGGCTGGCCAAGGAGGTCATGGAGGGGGTGTATCCCCTGGCCGTGCCCCTGGAGGTGGAGGTGGGGATAGGGGAGGACTGGCTCTCCGCCAAGGAGTGAGT Thermostable Clone 8 (T8): Amino Acid SequencePLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDAVI (SEQ ID NO:20) VVFDAKAPSSRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTAKKLLEEWGSLEALLENLDRLKPATREKILAHTDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKALEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVDRPLSAVLAHMEATGVRLDVAYLRALSLEVAEEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLEALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSDPNLQNIPVRTPLGQRIRRAFIAEEGWLLVVLDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGEDW LSAKE* Note:First two amino acids at N terminus not sequenced. Heparin ResistantClone 94: Nucleic Acid SequenceATTTTTGAGCCCAAGGGCCGCGTCCTCCTGGTGGACGGCCACCACCTGGCCTACCGC (SEQ ID NO:21) ACCTTCCACGCCCTGAAGGGCCTCACCACCAGCCGGGGGGAGCCGGTGCAGGCGGTCTACGGCTTCGCCAAGAGCCTCCTCAAGGCCCTCAAGGAGGACGGGGACGCGGTGATCGTGGTCTTTGACGCCAAGGCCCCCTCCTTCCGCCACGAGGCCTACGGGGGGTACAAGGCGGGCCGGGCCCCCACGCCGGAGGACTTTCCCCGGCAACTCGCCCTCATCAAGGAGCTGGTGGACCTCCTGGGGCTGGCGCGCCTCGAGGTCCCGGGCTACGAGGCGGACGACGTCCTGGCCAGCCTGGCCAAGAAGGCGGAAAAGGAGGGCTACGAGGTCCGCATCCTCACCGCCGACAAAGACCTTTACCAGCTCCTTTCCGACCGCATCCACGTCCTCCACCCCGAGGGGTACCTCATCACCCCGGCCTGGCTTTGGGAAAAGTACGGCCTGAGGCCCGACCAGTGGGCCGACTACCGGGCCCTGACCGGGGACGAGTCCGACAACCTTCCCGGTGTCAAGGGCATCGGGGAGAAGACGGCGAGGAAGCTTCTGGAGGAGTGGGGGAGCCTGGAAGCCCTCCTCAAGAACCTGGACCGGCTGGAGCCCGCCATCCGGGAGAAGATCCTGGCCCACATGGACGATCTGAAGCTCTCCTGGGACCTGGCCAAGGTGCGCACCGACCTGCCCCTGGAGGTGGACTTCGCCAAAAGGCGGGAGCCCGACCGGGAGAGGCTTAGGGCCTTTCTGGAGAGGCTTGAGTTTGGCAGCCTCCTCCACGAGTTCGGCCTTCTGGAAAGCCCCAAGGCCCCGGAGGAGGCCCCCTGGCCCCCGCCGGAAGGGGCCTTCGTGGGCTTTGTGCTTTCCCGCAAGGAGCCCATGTGGGCCGATCTTCTGGCCCTGGCCGCCGCCAGGGGGGGCCGGGTCCACCGGGCCCCCGAGCCTTATAAAGCCCTCAGGGACCTGAAGGAGGCGCGGGGGCTTCTCGCCAAAGACCTGAGCGTTCTGGCCCTGAGGGAAGGCCTTGGCCTCCCGCCCGGCGACGACCCCATGCTCCTCGCCTACCTCCTGGACCCTTCCAACACCACCCCCGAGGGGGTGGCCCGGCGCTACGGCGGGGAGTGGACGGAGGAGGCGGGGGAGCGGGCCGCCCTTTCCGAGAGGCTCTTCGCCAACCTGTGGGGGAGGCTTGAGGGGGAGGAGAGGCTCCTTTGGCTTTACCGGGAGGTGGAGAGGCCCCTTTCCGCTGTCCTGGCCCACATGGAGGCCACGGGGGTGCGCCTGGACGTGTCCTATCTCAGGGCCTTGTCCCGGGAGGTGGCCGAGGAGATCGCCCGCCTCGAGGCCGAGGTCTTCCGCCTGGCCGGCCACCCCTTCAACCTCAACTCCCGGGACCAGCTGGAAAGGGTCCTCTTTGACGAGCTAGGGCTTCCCGCCATCGGCAAGACGGAGAAGACCGGCAAGCGCTCCACCAGCGCCGCCGTCCTGGAGGCCCTCCGCGAGGCCCACCCCATCGTGGAGAAGATCCTGCAGTACCGGGAGCTCACCAAGCTGAAGAGCACCTACATTGACCCCTTGCCGGACCTCATCCACCCCAGGACGGGCCGCCTCCACACCCGCTTCAACCAGACGGCCACGGCCACGGGCAGGCTAAGTAGCTCCGGTCCCAACCTCCAGAGCATCCCCGTCCGCACCCCGCTTGGGCAGAGGATCCGCCGGGCCTTCATCGCCGAGGAGGGGTGGCTATTGGTGGCCCTGGACTATAGCCAGATAGAGCTCAGGGTGCTGGCCCACCTCTCCGGCGACGAGAACCTGATCCGGGTCTTCCAGGAGGGGCGGGACATCCACACGGAGACCGCCAGCTGGATGTTCGGCGTCCCCCGGGAGGCCGTGGACCCCCTGATGCGCCGGGCGGCCAAGACCATCAACTTCGGGGTCCTCTACGGCATGTCGGCCCACCGCCTCTCCCAGGAGCTAGCCATCCCTTACGAGGAGGCCCAGGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAGGAGGGCAGGAGGCGGGGGTACGTGGAGACCCTCTTCGGCCGCCGCCGCTACGTGCCAGACCTAGAGGCCCGGGTGAAGAGCGTGCGGGAGGCGGCCGAGCGCATGGCCTTCAACATGCCCGTCCAGGGCACCGCCGCCGACCTCATGAAGCTGGCTATGGTGAAGCTCTTCCCCAGGCTGGAGGAAATGGGGGCCAGGATGCTCCTTCAGGTCCACGACGAGCTGGTCCTCGAGGCCCCAAAAGAGAGGGCGGAGGCCGTGGCCCGGCTGGCCAAGGAGGTCATGGAGGGGGTGTATCCCCTGGCCGTGCCCCTGGAGGTGGAGGTGGGGATAGGGGAGGACTGGCTCTCCGC CAAGGAGTGATTHeparin Resistant Clone H94: Amino Acid SequenceFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDAVIVV (SEQ ID NO:22) FDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALLKNLDRLEPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKAPEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVSYLRALSREVAEEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLEALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSGPNLQSIPVRTPLGQRIRRAFIAEEGWLLVALDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGEDWLS AKE* Note.N-TERMINAL 5 amino acids not determined. Heparin Resistant Clone 15:Nucleic Acid SequenceTTTGAGCCCAAGGGCCGCGTCCTCCTGGTGGACGGCCACCACCTGGCCTACCGCACC (SEQ ID NO:23) TTCCACGCCCTGAAGGGCCTCACCACCAGCCGGGGGGAGCCGGTGCAGGCGGTCTACGGCTTCGCCAAGAGCCTCCTCAAGGCCCTCAAGGAGGACGGGGACGCGGTGATCGTGGTCTTTGACGCCAAGGCCCCCTCCTTCCGCCACGAGGCCTACGGGGGGTACAAGGCGGGCCGGGCCCCCACGCCGGAGGACTTTCCCCGGCAACTCGCCCTCATCAAGGAGCTGGTGGACCTCCTGGGGCTGGCGCGCCTCGAGGTCCCGGGCTACGAGGCGGACGACGTCCTGGCCAGCCTGGCCAAGAAGGCGGAAAAGGAGGGCTACGAGGTCCGCATCCTCACCGCCGACAAAGACCTTTACCAGCTCCTTTCCGACCGCATCCACGTCCTCCACCCCGAGGGGTACCTCATCACCCCGGCCTGGCTTTGGGAAAAGTACGGCCTGAGGCCCGACCAGTGGGCCGACTACCGGGCCCTGACCGGGGACGAGTCCGACAACCTTCCCGGTGTCAAGGGCATCGGGGAGAAGACGGCGAGGAAGCCTTCTGGAGGAGTGGGGGAGCCTGGAAGCCCTCCTCAAGAACCTGGACCGGCTGGAGCCCGCCATCCGGGAGAAGATCCTGGCCCACATGGACGATCTGAAGCTCTCCTGGGACCTGGCCAAGGTGCGCACCGACCTGCCCCTGGAGGTGGACTTCGCCAAAAGGCGGGAGCCCGACCGGGAGAGGCTTAGGGCCTTTCTGGAGAGGCTTGAGTTTGGCAGCCTCCTCCACGAGTTCGGCCTTCTGGAAAGCCCCAAGGCCCTGGAGGAGGCCCCCTGGCCCCCGCCGGAAGGGGCCTTCGTGGGCTTTGTGCTTTCCCGCAAGGAGCCCATGTGGGCCGATCTTCTGGCCCTGGCCGCCGCCAGGGGGGGTCGGGTCCACCGGGCCCCCGAGCCTTATAAAGCCCTCAGGGACCTGAAGGAGGCGCGGGGGCTTCTCGCCAAAGACCTGAGCGTTCTGGCCCTGAGGGAAGGCCTTGGCCTCCCGCCCGGCGACGACCCCATGCTCCTCGCCTACCTCCTGGACCCTTCCAACACCACCCCCGTGGGGGTGGCCCGGCGCTACGGCGGGGAGTGGACGGAGGAGGCGGGGGAGCGGGCCGCCCTTTCCGAGAGGCTCTTCGCCAACCTGTGGGGGAGGCTTGAGGGGGAGGAGAGGCTCCTTTGGCTTTACCGGGAGGTGGAGAGGCCCCTTTCCGCTGTCCTGGCCCACATGGAGGCTACGGGGGTGCGCCTGGACGTGGCCTATCTCAGGGCCTTGTCCCTGGAGGTGGCCGAGGAGATCGCCCGCCTCGAGGCCGAGGTCTTCCGCCTGGCCGGCCACCCCTTCAACCTCAACTCCCGGGACCAGCTGGAAAGGGTCCTCTTTGACGAGCTAGGGCTTCCCGCCATCGGCAAGACGGAGAAGACCGGCAAGCGCTCCACCAGCGCCGCCGTCCTGGAGGCCCTCCGCGAGGCCCACCCCATCGTGGAGAAGATCCTGCAGTACCGGGAGCTCACCAGGCTGAAGAGCACCTACATTGACCCCTTGCCGGACCTCATCCACCCCAGGACGGGCCGCCTCCACACCCGCTTCAACCAGACGGCCACGGCCACGGGCAGGCTAAGTAGCTCCGGTCCCAACCTCCAGAGCATCCCCGTCCGCACCCCGCTTGGGCAGAGGATCCGCCGGGCCTTCATCGCCGAGGAGGGGTGGCTATTGGTGGCCCTGGACTATAGCCAGATAGAGCTCAGGGTGCTGGCCCACCTCTCCGGCGACGAGAACCTGATCCGGGTCTTCCAGGAGGGGCGGGACATCCACACGGAGACCGCCAGCTGGATGTTCGGCGTCCCCCGGGAGGCCGTGGACCCCCTGATGCGCCGGGCGGCCAAGACCATCAACTTCGGGGTCCTCTACGGCATGTCGGCCCACCGCCTCTCCCAGGAGCTAGCCATCCCTTACGAGGAGGCCCAGGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAGGAGGGCAGGAGGCGGGGGTACGTGGAGACCCTCTTCGGCCGCCGCCGCTACGTGCCAGACCTAGAGGCCCGGGTGAAGAGCGTGCGGGAGGCGGCCGAGCGCAGGGCCTTCAACATGCCCGTCCAGGGCACCGCCGCCGACCTCATGAAGCTGGCTATGGTGAAGCTCTTCCCCAGGCTGGAGGAAATGGGGGCCAGGATGCTCCTTCAGGTCCACGACGAGCTGGTCCTCGAGGCCCCAAAAGAGAGGGCGGAGGCCGTGGCCCGGCTGGCCAAGGAGGTCATGGAGGGGGTGTATCCCCTGGCCGTGCCCCTGGAGGTGGAGGTGGGGATAGGGGAGGACTGGCTCTCCGCCAA GGAGTGAGTHeparin Resistant Clone 15: Amino Acid SequencePLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGDAVI (SEQ ID NO:24) VVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALLKNLDRLEPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKALEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPVGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVAYLRALSLEVAEEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLEALREAHPIVEKILQYRELTRLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSGPNLQSIPVRTPLGQRIRRAFIAEEGWLLVALDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFTERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREAAERRAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGEDW LSAKE* Note:N-terminal 5 amino acids not determined. Mismatch Extension Clone M1:Nucleic Acid SequenceTTGGAATGCTCCCTCTTTTTGAGCCCAAAGGCCGCGTCCTCCTGGTGGACGGCCACC (SEQ ID NO:25) ACCTGGCCTACCGCACCTTCCACGCCCTGAAGGGCCTCACCACCAGCCGGGGGGAGCCGGTGCAGGCGGTCTACGGCTTCGCCAAGAGCCTCCTCAAGGCCCTCAAGGAGGACGGGGACGCGGTGATCGTGGTCTTTGACGCCAAGGCCCCCTCCTTCCGCCACGAGGCCTACGGGGGGTACAAGGCGGCCCGGGCCCCCACGCCGGAGGACTTTCCCCGGCAACTCGCCCTCATCAAGGAGCTGGTGGATCTCCTGGGGCTGGCGCGCCTCGAGGTCCCGGGCTACGAGGCGGACGACGTCCTGGCCAGCCTGGCCAAGAAGGCGGAAAAGGAGGGCTACGAGGTCCGCATCCTCACCGCCGACAAAGGCCTTTACCAGCTCCTTTCCGACCGCATCCACGTCCTCCACCCCGAGGGGTACCTCATCACCCCGGCCTGGCTTTGGGAAAAGTACGGCCTGAGGCCCGACCAGTGGGCCGACTACCGGGCCCTGACCGGGGACGAGTCCGACAACCTTCCCGGGGTCAAGGGCATCGGGGAGAAGACGGCGAGGAAGCTTCTGGAGGAGTGGGGGAGCCTGGAAGCCCTCCTCAAGAACCTGGACCGGCTGAAGCCCGCCATCCGGGAGAAGATCCTGGCCCACATGGACGATCTGAAGCTCTCCTGGGATCTGGCCAAGGTGCGCACCGACCGCCCCTGOAGGTGGACTTCGCCAAAAGGCGGGAGCCCGACCGGGAGAGGCTTAGGGCCTTTCTGGAGAGGCTTGAGTTTGGCAGCCTCCTCCACGAGTTCGGCCTTCTGGAAAGCCCCAAGGCCCTGGAGGAGGCCCCCTGGCCCCCGCCGGAAGGGGCCTTCGTGGGCTTTGTCCTTTCCCGCAGGGAGCCCATGTGGGCCGATCTTCTGGCCCTGGCCGCCGCCAGGGGGGGCCGGGTCCACCGGGCCCCCGAGCCTTATAAAGCCCTCAGGGACCTGAAGGAGGCGCGGGGGCTTCTCGCCAAAGACCTGAGCGTTCTGGCCCTGAGGGAAGGCCTTGGCCTCCCGCCCGGCGACGACCCCATGCTCCTCGCCTACCTCCTGGACCCTTCCAACACCACCCCCGAGGGGGTGGCCCGGCGCTACGGCGGGGAGTGGACGGAGGAGGCGGGGGAGCGGGCCGCCCTTTCCGAGAGGCTCTTCGCCAACCTGTGGGGGAGGCTTGAGGGGGAGGAGAGGCTCCTTTGGCTTTACCGGGAGGTGGAGAGGCCCCTTTCCGCTGTCCTGGCCCACATGGAGGCCACGGGGGTGCGCCTGGACGTGGCCTATCTCAGGGCCTTGTCCCTGGAGGTGGCCGAGGAGATCGCCCGCCTCGAGGCCGAGGTCTTCCGCCTGGCCGGCCACCCCTTCAACCTCAACTCCCGGGACCAGCTGGAAAGGGTCCTCTTTGACGAGCTAGGGCTTCCCGCCATCGGCAAGACGGAGAAGACCGGCAAGCGCTCCACCAGCGCCGCCGTCCTGGGGGCCCTCCGCGAGGCCCACCCCATCGTGGAGAAGATCCTGCAGTACCGGGAGCTCACCAAGCTGAAGAGCACCTACATTGACCCCTTACCGGACCTCATCCACCCCAGGACGGGCCGCCTCCACACCCGCTTCAACCAGACGGCCACGGCCACGGGCAGGCTAAGTAGCTCCGATCCCAACCTCCAGAACATCCCCGTCCGCACCCCGCTTGGGCAGAGGATCCGCCGGGCCTTCATCGCCGAGGAGGGGTGGCTATTGGTGGTCCTGGACTATAGCCAGATAGAGCTCAGGGTGCTGGCCCACCTCTCCGGCGACGAGAACCTGATCCGGGTCTTCCAGGAGGGGCGGGACATCCACACGGAGACCGCCAGCTGGATGTTCGGCGTCCCCCGGGAGGCCGTGGACCCCCTGATGCGCCGGGCGGCCAAGACCATCAACTTCGGGGTCCTCTACGGCATGTCGGCCCACCGCCTCTCCCAGGAGCTAGCCATCCCTTACGAGGAGGCCCAGGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAGGAGGGCAGGAGGCGGGGGTACGTGGAGACCCTCTTCGGCCGCCGCCGCTACGTGCCAGACCTAGAGGCCCGGGTGAAGAGCGTGCGGGGGGCGGCCGAGCGCATGGCCTTCAACATGCCCGTCCAGGGCACCGCCGCCGACCTCATGAAGCTGGCTATGGTGAAGCTCTTCCCCAGGCTGGAGGAAATGGGGGCCAGGATGCTCCTTCAGGTCCACGACGAGCTGGTCCTCGAGGCCCCAAAAGAGAGGGCGGAGGCCGTGGCCCGGCTGGCCAAGGAGGTCATGGAGGGGGTGTATCCCCTGGCCGTGCCCCTGGAGGTGGAGGTGGGGATAGGGGAGGACTGGCTCTCCGCCAAGGAGTGAGTCGACCTGCAGGCAGCGCTTGGCGTCACCCGCAGTTCGGTGGTTAATAAGCTTGACCTGTGAAGTGAAAAATGGCGCACATTGTGCGACATTTTTTTTGTCTGCCGTTTACCGCTACTGCGTCACGGATCTCCACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGGTTCCCGATTTAGTGCTTTTACGGGACCTCGAACCCAAAAAATTGATTAGG Mismatch Extension Clone M1: Amino AcidSequence GMLPLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEDGD (SEQID NO: 26) AVIVVFDAKAPSFRHEAYGGYKAARAPTPEDFPRQLALIKELVDLLGLARLEVPGYEADDVLASLAKKAEKEGYEVRILTADKGLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALLKNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKALEEAPWPPPEGAFVGFVLSRREPMWADLLALAAARGGRVHRAPEPYKALRDLKEARGLLAKDLSVLALREGLGLPPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVAYLRALSLEVALEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLGALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSDPNLQNIPVRTPLGQRIRRAFIAEEGWLLVVLDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIERYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVRGAAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGE DWLSAKENote:N-terminal 2 amino acids not determined. Mismatch Extension CloneM4: Nucleic Acid SequenceTCTTTATGAGCCCAAGGGCCGCGTCCTCCTGGTGGACGGCCACCACCTGGCCTACCG (SEQ ID NO:27) CACCTTCCACGCCCTGAAGGGCCTCACCACCAGCCGGGGGGAGCCGGTGCAGGCGGTCTACGGCTTCGCCAAGAGCCTCCTCAAGGCCCTCAAGGAGGGCGGGGACGCGGTGATCGTGGTCTTTGACGCCAAGGCCCCCTCCTTCCCCCATGAGGCCTACGGGGGGTACAAGGCGGGCCGGGCCCCCACGCCGGAGGACTTTCCCCGACAACTCGCCCTCATCAAGGAGCTGGTGGACCTCCTGGGGCTGACGCGCCTCGAGGTCCCGGGCTACGAGGCGGACGACGTCCTGGCCAGCCTGGCCAAGAAGGCGGAAAAGGAGGGCTACGAGGTCCGCATCCTCACCGCCGACAAAGACCTTTACCAGCTCCTTTCCGACCGCATCCACGTCCTCCACCCCGAGGGGTACCTCATCACCCCGGCCTGGCTTTGGGAAAAGTACGGCCTGAGGCCCGACCAGTGGGCCGACTACCGGGCCCTGACCGGGGACGAGTCCGACAACCTTCCCGGGGTCAAGGGCATCGGGGAGAAGACGGCGAGGAAGCTTCTGGAGGAGTGGGGGAGCCTGGAAGCCCTCCTCAAGAACCTGGACCGGCTGAAGCCCGCCATCCGGGAGAAGATCCTGGCCCACATGGACGATCTGAAGCTCTCCTGGGACCGGGCCAAGGTGCGCACCGACCTGCCCCTGGAGGTGGACTTCGCCAAAAGGCGGGAGCCCGACCGGGAGAGGCTTAGGGCCTTTCTGGAGAGGCTTGAGTTTGGCAGCCTCCTCCACGAGTTCGGCCTTCTGGAAAGCCCCAAGGCCCTGGAGGAGGCCCCCTGGCCCCCGCCGGAAGGGGCCTTCGTGGGCTTTGTGCTTTCCCGCAAGGAGCCCATGTGGGCCGATCTTCTAGCCCTGGCCGCCGCCAGGGGGGGCCGGGTCCACCGGGCCCCCGAGCCTTATAAAGCCCTCGGGGACCTGAAGGAGGCGCGGGGGCTTCTCGCCAAAGACCTGAGCGTTCTGGCCCTGAGGGAAGGCCTTGGCCTCCCGCCCGACGACGACCCCATGCTCCTCGCCTACCTCCTGGACCCTTCCAACACCACCCCCGAGGGGGTGGCCCGGCGCTACGGCGGGGAGTGGACGGAGGAGCGAGGGGAGCGGGCCGCCCTTTCCGAGAGGCTCTTCGCCAACCTGTGGGGGAGGCTTGAGGGGGAGGAAAGGCTCCTTTGGCTTTACCGGGAGGTGGAGAGGCCCCTTTCCGCTGTCCTGGCCCACATGGAGGCCACGGGGGTGCGCCTGGACGTGGCCTATCTCAGGGCCTTGTCCCTGGAGGTGGCCGAGGAGATCGCCCGCCTCGAGGCCGAGGTCTTCCGCCTGGCCGGCCACCCCTTCAACCTCAACTCCCGGGACCAGCTGGAAAGGGTCCTCTTTGACGAGCTAGGGCTTCCCGCCATCGGCAAGACGGAGAAGACCGGCAAGCGCTCCACCAGCGCCGCCGTCCTGGGGGCCCTCCGCGAGGCCCACCCCATCGTGGAGAAGATCCTGCAGTACCGGGAGCTCACCAAGCTGAAGAGCACCTACATTGACCCCTTGCCGGACCTCATCCACCCCAGGACGGGCCGCCTCCACACCCGCTTCAACCAGACGGCCACGGCCACGGGCAGGCTAAGTAGCTCCGATCCCAACCTCCAGAGCATCCCCGTCCGCACCCCGCTTGGGCAGAGGATCCGCCGGGCCTTCATCGCCGAGGAGGGGTGGCTATTGGTGGCCCTGGACTATAGCCAGATAGAGCTCAGGGTGCTGGCCCACCTCTCCGGCGACGAGAACCTGATCCGGGTCTTCCAGGAGGGGCGGGACATCCACACGGAGACCGCCAGCTGGATGTTCGGCGTCCCCCGGGAGGCCGTGGACCCCCTGATGCGCCGGGCGGCCAAGACCATCAACTTCGGGGTCCTCTACGGCATGTCGGCCCACCGCCTCTCCCAGGAGCTAGCCATCCCTTACGAGGAGGCCCAGGCCTTCATTAAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAGGAGGGCAGGAGGCGGGGGTACGTGGAGACCCTCTTCGGCCGCCGCCGCTACGTGCCAGACCTAGAGGCCCGGGTGAAGAGCGTGCGGGAGCCGGCCGAGCGCATGGCCTTCAACATGCCCGTCCAGGGTACCGCCGCCGACCTCATGAAGCTGGCTATGGTGAAGCTCTTCCCCAGGCTGGAGGAAATGGGGGCCAGGATGCTCCTTCAGGTCCACGACGAGCTGGTCCTCGAGGCCCCAAAAGAGAGGGCGGAGGCCGTGGCCCGGCTGGCCAAGGAGGTCATGGAGGGGGTGTATCCCCTGGCCGTGCCCCTGGAGGTGGAGGTGGGGATAGGGGAGGACTGGCTCTCCG CCAAGGAGTGAGTMismatch Extension Clone M4: Amino Acid SequenceLYEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKALKEGGDAVIV (SEQ ID NO:28) VFDAKAPSFPHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLTRLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITPAWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALLKNLDRLKPAIREKILAHMDDLKLSWDRAKFRTDLPLEVDFAKRREPDRERLRAFLERLEFGSLLHEFGLLESPKALEEAPWPPPEGAFVGFVLSRKEPMWADLLALAAARGGRVHRAPEPYKALGDLKEARGLLAKDLSVLALREGLGLPPDDDPMLLAYLLDPSNTTPEGVARRYGGEWTEEAGERAALSERLFANLWGRLEGEERLLWLYREVERPLSAVLAHMEATGVRLDVAYLRALSLEVALEIARLEAEVFRLAGHPFNLNSRDQLERVLFDELGLPAIGKTEKTGKRSTSAAVLGALREAHPIVEKILQYRELTKLKSTYIDPLPDLIHPRTGRLHTRFNQTATATGRLSSSDPNLQSIPVRTPLGQRIRRAFIAEEGWLLVALDYSQIELRVLAHLSGDENLIRVFQEGRDIHTETASWMFGVPREAVDPLMRRAAKTINFGVLYGMSAHRLSQELAIPYEEAQAFIKRYFQSFPKVRAWIEKTLEEGRRRGYVETLFGRRRYVPDLEARVKSVREPAERMAFNMPVQGTAADLMKLAMVKLFPRLEEMGARMLLQVHDELVLEAPKERAEAVARLAKEVMEGVYPLAVPLEVEVGIGEDWL SAKE Note:N-terminal 6 amino acids not determined.

1. A nucleic acid processing (NAP) enzyme identified by a methodcomprising the steps of: (a) providing a pool of nucleic acidscomprising members encoding a NAP enzyme or a variant of a NAP enzyme;(b) subdividing the pool of nucleic acids into compartments, such thateach compartment comprises a nucleic acid member of the pool togetherwith the NAP enzyme or variant encoded by the nucleic acid member; (c)allowing nucleic acid processing to occur; and (d) detecting processingof the nucleic acid member by the NAP enzyme, whereby a NAP enzyme isselected.
 2. The NAP enzyme of claim 1, wherein said NAP enzyme is avariant of a known NAP enzyme, and wherein said variant has greaterthermostability than said known NAP enzyme.
 3. The NAP enzyme of claim1, wherein said NAP enzyme is a variant of a known NAP enzyme, andwherein said variant is inhibited to a lesser extent by heparin than issaid known NAP enzyme.
 4. The NAP enzyme of claim 3 which is a Taqpolymerase active at a concentration of 0.083 units/μl or more ofheparin.
 5. The NAP enzyme of claim 1 which is a Taq polymerase activeat a concentration of 0.083 units/μl or more of heparin.
 6. The NAPenzyme of claim 1 which is a replicase enzyme that extends a primerhaving a 3′ mismatch.
 7. The NAP enzyme of claim 1 which is a replicaseenzyme that extends a primer having a 3′ unnatural base.
 8. The NAPenzyme of claim 7 which is capable of extending a primer having a 3′terminal base comprising 5-nitroindole or 3-carboxyamide-5-nitroindole.9. The NAP enzyme of claim 1 which is a replicase enzyme variant of aknown replicase enzyme, wherein said variant incorporates α-thio dNTPsas nucleotide substrates more efficiently than said known replicaseenzyme.
 10. The NAP enzyme of claim 1 which is a replicase enzymevariant of a known replicase enzyme, wherein said variant replicates asubstrate 23 kb in size in the absence of processivity factors or a3′-5′ exonuclease proof-reading domain.
 11. The NAP enzyme of claim 6 inwhich the 3′ mismatch is a 3′ purine-purine mismatch or a 3′pyrimidine-pyrimidine mismatch.
 12. The NAP enzyme of claim 6 in whichthe 3′ mismatch is an A-G mismatch or in which the 3′ mismatch is a C-Cmismatch.
 13. A Taq polymerase mutant comprising a mutation selectedfrom the group consisting of G84A, D144G, K314R, E520G, A608V and E742G.14. A Taq polymerase mutant comprising G84A, D144G, K314R, E520G, A608Vand E742G mutations.